From jwilhelm at openjdk.org Fri Jul 1 00:52:48 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Fri, 1 Jul 2022 00:52:48 GMT Subject: Integrated: Merge jdk19 In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 23:50:57 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 19 -> JDK 20 This pull request has now been integrated. Changeset: 918068a1 Author: Jesper Wilhelmsson URL: https://git.openjdk.org/jdk/commit/918068a115efee7d439084b6d743cab5193bd943 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Merge ------------- PR: https://git.openjdk.org/jdk/pull/9341 From dholmes at openjdk.org Fri Jul 1 12:34:43 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 1 Jul 2022 12:34:43 GMT Subject: RFR: 8289534: Change 'uncomplicated' hotspot runtime options In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 18:39:58 GMT, Harold Seigel wrote: > Please review this small fix to change range constrained JVM runtime options from 64 bits to 32 bits. This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold The changes seem fine, though the change to ObjectAlignmentInBytes is a bit disruptive. I don't really see a motivation for this though - the memory saving seems insignificant. src/hotspot/share/runtime/perfMemory.cpp line 2: > 1: /* > 2: * Copyright (c) 2001, 2021, Oracle and/or its affiliates. All rights reserved. 2022 ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9338 From tholenstein at openjdk.org Fri Jul 1 13:34:43 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 1 Jul 2022 13:34:43 GMT Subject: RFR: JDK-8277060: EXCEPTION_INT_DIVIDE_BY_ZERO in TypeAryPtr::dump2 with -XX:+TracePhaseCCP [v2] In-Reply-To: References: Message-ID: On Wed, 29 Jun 2022 18:38:39 GMT, Vladimir Kozlov wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> stronger assert > > Good. @vnkozlov , @dean-long , @chhagedorn and @TobiHartmann thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/9295 From tholenstein at openjdk.org Fri Jul 1 13:38:49 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 1 Jul 2022 13:38:49 GMT Subject: Integrated: JDK-8277060: EXCEPTION_INT_DIVIDE_BY_ZERO in TypeAryPtr::dump2 with -XX:+TracePhaseCCP In-Reply-To: References: Message-ID: <8MtRvEp-bA9urcuGQZSbelxzdaeZTGn_UqWN4uMT0YI=.21077193-afd9-407c-90dc-8e66d60d9c44@github.com> On Mon, 27 Jun 2022 08:45:19 GMT, Tobias Holenstein wrote: > `-XX:+TracePhaseCCP`fails in `TypeAryPtr::dump2` when `_offset` >= `header_size` and the basic type of the array element (`elem()->basic_type()`) is `T_ILLEGAL`: This case needs to be handled separately and print `+any`. Otherwise calling `type2aelembytes(T_ILLEGAL)` would lead to an out of array access because `T_ILLEGAL` has int value 99 and `_type2aelembytes[]` only has size 20. That `type2aelembytes(T_ILLEGAL)` returns zero and therefore triggers the `EXCEPTION_INT_DIVIDE_BY_ZERO` was luck. Therefore an assert was added to `type2aelembytes` to catch out-of bound accesses. > > In the test case node `827 CMoveP` has base type `T_ILLEGAL` because `elem()->base()` is `Type::Bottom`. Normally an array would have either type `int[]` or `long[]`. Because we assign it to `Object`, the `Object` has type `bottom[int:1]:NotNull` because the is no common supertype of `long[int:1]:NotNull:exact` and `int[int:1]:NotNull:exact`. In normal Java we could not copy the `Object srcArrLocal ` to `int[] dstArr`, because we would need to access `srcArrLocal[]` which is not possible for `Object` - But using `UNSAFE.copyMemory` this is allowed. Therefore the printing code has to be adjusted to support this case > > https://github.com/openjdk/jdk/blob/6605d1614db2de302ebaf90863dcd2585b5c27ba/test/hotspot/jtreg/compiler/debug/TestTracePhaseCCP.java#L49-L51 > > ![T_ILLEGAL](https://user-images.githubusercontent.com/71546117/176450705-12d4dc5c-f80d-4a7f-ae60-d71ddd089678.png) This pull request has now been integrated. Changeset: b9b900a6 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/b9b900a61ca914c7931d69bd4a8aeaa948be1d64 Stats: 74 lines in 3 files changed: 70 ins; 0 del; 4 mod 8277060: EXCEPTION_INT_DIVIDE_BY_ZERO in TypeAryPtr::dump2 with -XX:+TracePhaseCCP Reviewed-by: kvn, thartmann, chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/9295 From hseigel at openjdk.org Fri Jul 1 14:35:49 2022 From: hseigel at openjdk.org (Harold Seigel) Date: Fri, 1 Jul 2022 14:35:49 GMT Subject: RFR: 8289534: Change 'uncomplicated' hotspot runtime options [v2] In-Reply-To: References: Message-ID: > Please review this small fix to change range constrained JVM runtime options from 64 bits to 32 bits. This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: Fix copyright date ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9338/files - new: https://git.openjdk.org/jdk/pull/9338/files/b71c60d8..7c973070 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9338&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9338&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9338.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9338/head:pull/9338 PR: https://git.openjdk.org/jdk/pull/9338 From hseigel at openjdk.org Fri Jul 1 14:35:50 2022 From: hseigel at openjdk.org (Harold Seigel) Date: Fri, 1 Jul 2022 14:35:50 GMT Subject: RFR: 8289534: Change 'uncomplicated' hotspot runtime options In-Reply-To: References: Message-ID: <8E9_o0CLkKkyN0H3XRgZ9AMoWczAwVD7U8xy69jvXIk=.77b3b599-3584-4dc4-b838-ff7ecfdccfde@github.com> On Thu, 30 Jun 2022 18:39:58 GMT, Harold Seigel wrote: > Please review this small fix to change range constrained JVM runtime options from 64 bits to 32 bits. This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Thanks Coleen and David for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/9338 From hseigel at openjdk.org Fri Jul 1 14:35:52 2022 From: hseigel at openjdk.org (Harold Seigel) Date: Fri, 1 Jul 2022 14:35:52 GMT Subject: RFR: 8289534: Change 'uncomplicated' hotspot runtime options [v2] In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 12:28:35 GMT, David Holmes wrote: >> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright date > > src/hotspot/share/runtime/perfMemory.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2001, 2021, Oracle and/or its affiliates. All rights reserved. > > 2022 Fixed. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/9338 From hseigel at openjdk.org Fri Jul 1 14:35:53 2022 From: hseigel at openjdk.org (Harold Seigel) Date: Fri, 1 Jul 2022 14:35:53 GMT Subject: Integrated: 8289534: Change 'uncomplicated' hotspot runtime options In-Reply-To: References: Message-ID: <2BSUfqtiSzV5aQTgb_etNzaBTT_Xp7gaPBhRVMQw0sI=.3adeb8a8-f159-4409-b1dd-3cd458552b56@github.com> On Thu, 30 Jun 2022 18:39:58 GMT, Harold Seigel wrote: > Please review this small fix to change range constrained JVM runtime options from 64 bits to 32 bits. This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold This pull request has now been integrated. Changeset: 09b4032f Author: Harold Seigel URL: https://git.openjdk.org/jdk/commit/09b4032f8b07335729e71b16b8f735514f3aebce Stats: 42 lines in 10 files changed: 1 ins; 0 del; 41 mod 8289534: Change 'uncomplicated' hotspot runtime options Reviewed-by: coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/9338 From duke at openjdk.org Fri Jul 1 15:05:16 2022 From: duke at openjdk.org (Justin Gu) Date: Fri, 1 Jul 2022 15:05:16 GMT Subject: RFR: 8289164: Convert ResolutionErrorTable to use ResourceHashtable Message-ID: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> Please review my change of converting the resolutionErrorTable from hashtable to resource hashtable. I tested my changes with a mach5 tier1-4 test. ------------- Commit messages: - Add StackObj for iterators - Fix global operator error - Fix whitespace - 8267935: Convert ResolutionErrorTable to Resource Hashtable - 8289164: Convert ResolutionErrorTable to use ResourceHashtable Changes: https://git.openjdk.org/jdk/pull/9337/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9337&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289164 Stats: 294 lines in 7 files changed: 115 ins; 111 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/9337.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9337/head:pull/9337 PR: https://git.openjdk.org/jdk/pull/9337 From iklam at openjdk.org Fri Jul 1 16:38:46 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 1 Jul 2022 16:38:46 GMT Subject: RFR: 8289164: Convert ResolutionErrorTable to use ResourceHashtable In-Reply-To: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> References: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> Message-ID: On Thu, 30 Jun 2022 15:15:45 GMT, Justin Gu wrote: > Please review my change of converting the resolutionErrorTable from hashtable to resource hashtable. I tested my changes with a mach5 tier1-4 test. This looks very nice and much simpler than before. I found one issue that needs to be fixed. src/hotspot/share/classfile/resolutionErrors.cpp line 144: > 142: bool do_entry(uintptr_t key, ResolutionErrorEntry* value) { > 143: ConstantPool* pool = value -> pool(); > 144: return !(pool->pool_holder()->is_loader_alive()); The `ResolutionErrorEntry` also needs to be freed. I think it can be done like this: if (!(pool->pool_holder()->is_loader_alive())) { delete value; return true; } else { return false; } You can add some `tty->print_cr` in `ResolutionErrorEntry::~ResolutionErrorEntry()` to verify that the destructor is actually called (I believe it's not called with the current version of this PR). Or, you can set a breakpoint there inside a debugger. Similar changes are needed in `ResolutionErrorDeleteIterate` ------------- Changes requested by iklam (Reviewer). PR: https://git.openjdk.org/jdk/pull/9337 From sspitsyn at openjdk.org Fri Jul 1 18:01:11 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 1 Jul 2022 18:01:11 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated Message-ID: This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. ------------- Commit messages: - fixed one trailing space issue - 8288703: GetThreadState returns 0 for virtual thread that has terminated Changes: https://git.openjdk.org/jdk19/pull/102/files Webrev: https://webrevs.openjdk.org/?repo=jdk19&pr=102&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288703 Stats: 51 lines in 3 files changed: 46 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk19/pull/102.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/102/head:pull/102 PR: https://git.openjdk.org/jdk19/pull/102 From duke at openjdk.org Fri Jul 1 18:48:54 2022 From: duke at openjdk.org (Justin Gu) Date: Fri, 1 Jul 2022 18:48:54 GMT Subject: RFR: 8289164: Convert ResolutionErrorTable to use ResourceHashtable [v2] In-Reply-To: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> References: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> Message-ID: > Please review my change of converting the resolutionErrorTable from hashtable to resource hashtable. I tested my changes with a mach5 tier1-4 test. Justin Gu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8289164: Convert ResolutionErrorTable to use ResourceHashtable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9337/files - new: https://git.openjdk.org/jdk/pull/9337/files/44fc9cfc..cad56c8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9337&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9337&range=00-01 Stats: 14 lines in 1 file changed: 10 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9337.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9337/head:pull/9337 PR: https://git.openjdk.org/jdk/pull/9337 From alanb at openjdk.org Fri Jul 1 18:58:38 2022 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 1 Jul 2022 18:58:38 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 17:47:51 GMT, Serguei Spitsyn wrote: > This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. The fix looks good but I'm wondering why the SelfSuspendDisabledTest is being used to test this bug. I guess I expected to see a test for GetThreadState instead. test/hotspot/jtreg/serviceability/jvmti/vthread/SelfSuspendDisablerTest/SelfSuspendDisablerTest.java line 60: > 58: } > 59: > 60: private static void testJvmtiThreadState(Thread thread, int expectedState) throws RuntimeException { Minor nit, "throws RuntimeException" is not needed here. test/hotspot/jtreg/serviceability/jvmti/vthread/SelfSuspendDisablerTest/libSelfSuspendDisablerTest.cpp line 68: > 66: } > 67: > 68: } it might be helpful to add // extern "C" after the brace as it confused me initially as to why there are two braces. ------------- PR: https://git.openjdk.org/jdk19/pull/102 From amenkov at openjdk.org Fri Jul 1 18:58:39 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 1 Jul 2022 18:58:39 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 17:47:51 GMT, Serguei Spitsyn wrote: > This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. Marked as reviewed by amenkov (Reviewer). ------------- PR: https://git.openjdk.org/jdk19/pull/102 From cjplummer at openjdk.org Fri Jul 1 20:02:40 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 1 Jul 2022 20:02:40 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 17:47:51 GMT, Serguei Spitsyn wrote: > This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. Marked as reviewed by cjplummer (Reviewer). test/hotspot/jtreg/serviceability/jvmti/vthread/SelfSuspendDisablerTest/SelfSuspendDisablerTest.java line 102: > 100: } > 101: > 102: testJvmtiThreadState(t2, SUSPENDED); Not a useful check after the isSuspended(t2) call above, but no harm in it either. ------------- PR: https://git.openjdk.org/jdk19/pull/102 From sspitsyn at openjdk.org Fri Jul 1 21:48:29 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 1 Jul 2022 21:48:29 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v2] In-Reply-To: References: Message-ID: > This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: fix serviceability/jvmti/thread/thrstat03 to except correct GetThreadState result for terminated vthreads ------------- Changes: - all: https://git.openjdk.org/jdk19/pull/102/files - new: https://git.openjdk.org/jdk19/pull/102/files/85cb92ba..7199e962 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk19&pr=102&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk19&pr=102&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk19/pull/102.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/102/head:pull/102 PR: https://git.openjdk.org/jdk19/pull/102 From ccheung at openjdk.org Fri Jul 1 21:49:47 2022 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 1 Jul 2022 21:49:47 GMT Subject: RFR: 8289230: Move PlatformXXX class declarations out of os_xxx.hpp [v3] In-Reply-To: References: <0v7-TE5YMMz_zYMiuxdpTNpFcfCWqX-eG9l5R0uSvHk=.7b09cfd0-c85a-48bb-bf54-581496e2d996@github.com> Message-ID: On Tue, 28 Jun 2022 20:13:01 GMT, Ioi Lam wrote: >> There are only two implementations of these classes (one for windows, and one for posix): >> >> - PlatformEvent >> - PlatformParker >> - PlatformMutex >> - PlatformMonitor >> - ThreadCrashProtection >> >> Before this PR, these classes are declared in os_xxx.hpp. This causes excessive inclusion of the large header file os.hpp by popular headers such as mutex.hpp, which needs only the declaration of PlatformMutex but not the other stuff in os.hpp >> >> This PR moves the declarations to park_posix.hpp, mutex_posix.hpp, etc. >> >> Note: ideally, the definition of PlatformParker/PlatformEvent should be moved to park_posix.cpp, and PlatformMutex/PlatformMonitor should be moved to mutex_posix.cpp. However, the definition of these 4 classes are intertwined, so I'll leave them inside os_posix.cpp for now. (Same for the Windows version). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @coleenp comments Looks good. Just one nit. src/hotspot/share/runtime/mutex.cpp line 334: > 332: } > 333: > 334: Line 334 deleted by accident? ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.org/jdk/pull/9303 From cjplummer at openjdk.org Fri Jul 1 22:08:44 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 1 Jul 2022 22:08:44 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v2] In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 21:48:29 GMT, Serguei Spitsyn wrote: >> This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fix serviceability/jvmti/thread/thrstat03 to except correct GetThreadState result for terminated vthreads Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.org/jdk19/pull/102 From sspitsyn at openjdk.org Sat Jul 2 03:13:26 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 03:13:26 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v2] In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 18:53:36 GMT, Alan Bateman wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix serviceability/jvmti/thread/thrstat03 to except correct GetThreadState result for terminated vthreads > > test/hotspot/jtreg/serviceability/jvmti/vthread/SelfSuspendDisablerTest/libSelfSuspendDisablerTest.cpp line 68: > >> 66: } >> 67: >> 68: } > > it might be helpful to add > > // extern "C" > > after the brace as it confused me initially as to why there are two braces. Good catch. Fixed. ------------- PR: https://git.openjdk.org/jdk19/pull/102 From sspitsyn at openjdk.org Sat Jul 2 03:28:42 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 03:28:42 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v2] In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 18:51:55 GMT, Alan Bateman wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix serviceability/jvmti/thread/thrstat03 to except correct GetThreadState result for terminated vthreads > > test/hotspot/jtreg/serviceability/jvmti/vthread/SelfSuspendDisablerTest/SelfSuspendDisablerTest.java line 60: > >> 58: } >> 59: >> 60: private static void testJvmtiThreadState(Thread thread, int expectedState) throws RuntimeException { > > Minor nit, "throws RuntimeException" is not needed here. Ah, yes. I was thinking it is not needed but forgot to double-check. Fixed now. > The fix looks good but I'm wondering why the SelfSuspendDisabledTest is being used to test this bug. I guess I expected to see a test for GetThreadState instead. The GetThreadState by its instrumental role can be used by many tests. My initial intent was to check the terminated virtual thread case. It looked as unneeded overhead to create a new test for this. So, I've found this small test which already has convenient infrastructure to recreate needed conditions. Then I decided to extend GetThreadState coverage in this test a little bit. I can create a GetThreadState specific test if you think it is worth it. Interesting enough that I've found the existing test which already had needed coverage: ` test/hotspot/jtreg/serviceability/jvmti/thread/GetThreadState/thrstat03` But this test was adjusted to adopt to incorrect GetThreadState result for virtual threads, so I had to fix it now. Strongly speaking, the update of SelfSuspendDisabledTest is not needed. But I feel it is worth to keep it. ------------- PR: https://git.openjdk.org/jdk19/pull/102 From sspitsyn at openjdk.org Sat Jul 2 03:28:46 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 03:28:46 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v2] In-Reply-To: References: Message-ID: <-C3_lYflAJGrTkf1z6v7hmoopY0ROdiyRgBX63vSQco=.17b26b9e-f840-4eb9-a2f7-eb813aacb26a@github.com> On Sat, 2 Jul 2022 03:10:05 GMT, Serguei Spitsyn wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/SelfSuspendDisablerTest/libSelfSuspendDisablerTest.cpp line 68: >> >>> 66: } >>> 67: >>> 68: } >> >> it might be helpful to add >> >> // extern "C" >> >> after the brace as it confused me initially as to why there are two braces. > > Good catch. Fixed. Good suggestion. Resolved. ------------- PR: https://git.openjdk.org/jdk19/pull/102 From sspitsyn at openjdk.org Sat Jul 2 03:28:43 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 03:28:43 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v2] In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 19:59:25 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix serviceability/jvmti/thread/thrstat03 to except correct GetThreadState result for terminated vthreads > > test/hotspot/jtreg/serviceability/jvmti/vthread/SelfSuspendDisablerTest/SelfSuspendDisablerTest.java line 102: > >> 100: } >> 101: >> 102: testJvmtiThreadState(t2, SUSPENDED); > > Not a useful check after the isSuspended(t2) call above, but no harm in it either. You are right in both cases. :) ------------- PR: https://git.openjdk.org/jdk19/pull/102 From sspitsyn at openjdk.org Sat Jul 2 03:35:11 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 03:35:11 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v3] In-Reply-To: References: Message-ID: > This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: resolved minor comments for the SelfSuspendDisablerTest test ------------- Changes: - all: https://git.openjdk.org/jdk19/pull/102/files - new: https://git.openjdk.org/jdk19/pull/102/files/7199e962..8c9e104b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk19&pr=102&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk19&pr=102&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk19/pull/102.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/102/head:pull/102 PR: https://git.openjdk.org/jdk19/pull/102 From iklam at openjdk.org Sat Jul 2 04:26:48 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 2 Jul 2022 04:26:48 GMT Subject: RFR: 8289230: Move PlatformXXX class declarations out of os_xxx.hpp [v4] In-Reply-To: <0v7-TE5YMMz_zYMiuxdpTNpFcfCWqX-eG9l5R0uSvHk=.7b09cfd0-c85a-48bb-bf54-581496e2d996@github.com> References: <0v7-TE5YMMz_zYMiuxdpTNpFcfCWqX-eG9l5R0uSvHk=.7b09cfd0-c85a-48bb-bf54-581496e2d996@github.com> Message-ID: > There are only two implementations of these classes (one for windows, and one for posix): > > - PlatformEvent > - PlatformParker > - PlatformMutex > - PlatformMonitor > - ThreadCrashProtection > > Before this PR, these classes are declared in os_xxx.hpp. This causes excessive inclusion of the large header file os.hpp by popular headers such as mutex.hpp, which needs only the declaration of PlatformMutex but not the other stuff in os.hpp > > This PR moves the declarations to park_posix.hpp, mutex_posix.hpp, etc. > > Note: ideally, the definition of PlatformParker/PlatformEvent should be moved to park_posix.cpp, and PlatformMutex/PlatformMonitor should be moved to mutex_posix.cpp. However, the definition of these 4 classes are intertwined, so I'll leave them inside os_posix.cpp for now. (Same for the Windows version). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - removed unrelated newline change - Merge branch 'master' into 8289230-move-Platform-classes-out-of-os-xxx-hpp - @coleenp comments - fixed comments - fixed windows - Moved PlatformMutex/PlatformMonitor - move-PlatformParker-out-of-os-xxx-hpp - Moved ThreadCrashProtection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9303/files - new: https://git.openjdk.org/jdk/pull/9303/files/9d502ffb..bafda85c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9303&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9303&range=02-03 Stats: 8540 lines in 227 files changed: 5586 ins; 1406 del; 1548 mod Patch: https://git.openjdk.org/jdk/pull/9303.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9303/head:pull/9303 PR: https://git.openjdk.org/jdk/pull/9303 From alanb at openjdk.org Sat Jul 2 05:09:41 2022 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 2 Jul 2022 05:09:41 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v3] In-Reply-To: References: Message-ID: On Sat, 2 Jul 2022 03:35:11 GMT, Serguei Spitsyn wrote: >> This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > resolved minor comments for the SelfSuspendDisablerTest test Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.org/jdk19/pull/102 From alanb at openjdk.org Sat Jul 2 05:09:43 2022 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 2 Jul 2022 05:09:43 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v3] In-Reply-To: References: Message-ID: On Sat, 2 Jul 2022 03:22:31 GMT, Serguei Spitsyn wrote: > Interesting enough that I've found the existing test which already had needed coverage: ` test/hotspot/jtreg/serviceability/jvmti/thread/GetThreadState/thrstat03` But this test was adjusted to adopt to incorrect GetThreadState result for virtual threads, so I had to fix it now. Strongly speaking, the update of SelfSuspendDisabledTest is not needed. But I feel it is worth to keep it. This goes to my surprise that we didn't have a test already but you've found it, and found that it should have caught this bug except that it has been changed. Good to find this and I that is the right place to have a unit test for GetThreadState. ------------- PR: https://git.openjdk.org/jdk19/pull/102 From sspitsyn at openjdk.org Sat Jul 2 05:46:25 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 05:46:25 GMT Subject: [jdk19] RFR: 8288703: GetThreadState returns 0 for virtual thread that has terminated [v3] In-Reply-To: References: Message-ID: <-SZHKB2nKh--xjsLg9jmFea0e4wyCRqpmBBHYiyhHUc=.50825af6-60f7-40c4-885b-962862a2a250@github.com> On Sat, 2 Jul 2022 03:35:11 GMT, Serguei Spitsyn wrote: >> This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > resolved minor comments for the SelfSuspendDisablerTest test Alan, Alex and Chris, thank you for reviews and comments! ------------- PR: https://git.openjdk.org/jdk19/pull/102 From sspitsyn at openjdk.org Sat Jul 2 05:46:26 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 05:46:26 GMT Subject: [jdk19] Integrated: 8288703: GetThreadState returns 0 for virtual thread that has terminated In-Reply-To: References: Message-ID: On Fri, 1 Jul 2022 17:47:51 GMT, Serguei Spitsyn wrote: > This is fixing the JVM TI GetThreadState issue which returns for terminated virtual thread state = 0 instead of 2 (`JVMTI_THREAD_STATE_TERMINATED`). The problem was in the function `JvmtiEnvBase::get_threadOop_and_JavaThread` which does a check and reurns JVMTI_ERROR_THREAD_NOT_ALIVE a little bit early (before the values of `java_thread` and `thread_oop` are set). This was a root cause of the `GetThreadState` incorrect behavior. This pull request has now been integrated. Changeset: 9515560c Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk19/commit/9515560c54438156b37f1549229bcb5535df5fd1 Stats: 52 lines in 4 files changed: 46 ins; 2 del; 4 mod 8288703: GetThreadState returns 0 for virtual thread that has terminated Reviewed-by: alanb, amenkov, cjplummer ------------- PR: https://git.openjdk.org/jdk19/pull/102 From wkudla.kernel at gmail.com Sat Jul 2 06:42:38 2022 From: wkudla.kernel at gmail.com (Wojciech Kudla) Date: Sat, 2 Jul 2022 07:42:38 +0100 Subject: Obsoleting JavaCritical In-Reply-To: <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> Message-ID: Hi Maurizio, Thanks for staying on this. > Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? >From the top of my head: clock_gettime recvmsg recvmmsg sendmsg sendmmsg select getpid getcpu getrusage > Also, could you please tell us whether any of these calls need to interact with Java arrays? No arrays or objects of any type involved. Everything happens by the means of passing raw pointers as longs and using other primitive types as function arguments. > In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? Criticial JNI natives are used solely to remove the cost of transitions. We don't get anywhere near java heap in native code. In general I think it makes a lot of sense for Java as a language/platform to have some guards around unsafe code, but on the other hand the popularity of libraries employing Unsafe and their success in more performance-oriented corners of software engineering is a clear indicator there is a need for the JVM to provide access to more low-level primitives and mechanisms. I think it's entirely fair to tell developers that all bets are off when they get into some non-idiomatic scenarios but please don't take away a feature that greatly contributed to Java's success. Kind regards, Wojtek On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < maurizio.cimadamore at oracle.com> wrote: > Hi Wojciech, > picking up this thread again. After some internal discussion, we realize > that we don't know enough about your use case. While re-enabling JNI > critical would obviously provide a quick fix, we're afraid that (a) > developers might end up depending on JNI critical when they don't need to > (perhaps also unaware of the consequences of depending on it) and (b) that > there might actually be _better_ (as in: much faster) solutions than using > critical native calls to address at least some of your use cases (that > seemed to be the case with the clock_gettime example you mentioned). Could > you please provide a rough list of the native calls you make where you > believe critical JNI is having a real impact in the performance of your > application? Also, could you please tell us whether any of these calls need > to interact with Java arrays? In other words, do you use critical JNI to > remove the cost associated with thread transitions, or are you also taking > advantage of accessing on-heap memory _directly_ from native code? > > Regards > Maurizio > On 13/06/2022 21:38, Wojciech Kudla wrote: > > Hi Mark, > > Thanks for your input and apologies for the delayed response. > > > If the platform included, say, an intrinsified System.nanoRealTime() > method that returned clock_gettime(CLOCK_REALTIME), how much would > that help developers in your unnamed industry? > > Exposing realtime clock with nanosecond granularity in the JDK would be a > great step forward. I should have made it clear that I represent fintech > corner (investment banking to be exact) but the issues my message touches > upon span areas such as HPC, audio processing, gaming, and defense industry > so it's not like we have an isolated case. > > > In a similar vein, if people are finding it necessary to ?replace parts > of NIO with hand-crafted native code? then it would be interesting to > understand what their requirements are > > As for the other example I provided with making very short lived syscalls > such as recvmsg/recvmmsg the premise is getting access to hardware > timestamps on the ingress and egress ends as well as enabling batch receive > with a single syscall and otherwise exploiting features unavailable from > the JDK (like access to CMSG interface, scatter/gather, etc). > There are also other examples of calls that we'd love to make often and at > lowest possible cost (ie. getrusage) but I'm not sure if there's a strong > case for some of these ideas, that's why it might be worth looking into > more generic approach for performance sensitive code. > Hope this does better job at explaining where we're coming from than my > previous messages. > > Thanks, > W > > On Tue, Jun 7, 2022 at 6:31 PM wrote: > >> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >> >> CLOCK_REALTIME. >> > >> > Unfortunately System.currentTimeMillis() offers only millisecond >> > granularity which is the reason why our industry has to resort to >> > clock_gettime. >> >> If the platform included, say, an intrinsified System.nanoRealTime() >> method that returned clock_gettime(CLOCK_REALTIME), how much would >> that help developers in your unnamed industry? >> >> In a similar vein, if people are finding it necessary to ?replace parts >> of NIO with hand-crafted native code? then it would be interesting to >> understand what their requirements are. Some simple enhancements to >> the NIO API would be much less costly to design and implement than a >> generalized user-level native-call intrinsification mechanism. >> >> - Mark >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sspitsyn at openjdk.org Sat Jul 2 07:09:46 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 07:09:46 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v2] In-Reply-To: References: Message-ID: On Sat, 25 Jun 2022 01:23:47 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Remove outdated comment" > > This reverts commit 8f571d76e34bc64ceb31894184fba4b909e8fbfe. src/hotspot/share/runtime/sharedRuntime.cpp line 1563: > 1561: JRT_BLOCK_ENTRY(address, SharedRuntime::resolve_static_call_C(JavaThread* current )) > 1562: methodHandle callee_method; > 1563: bool enter_special = false; One micro suggestion is to rename: `enter_special => is_enter_special`. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From sspitsyn at openjdk.org Sat Jul 2 07:20:44 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 07:20:44 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v2] In-Reply-To: References: Message-ID: <_F4Jxh8T1Xb-Td4mGBGmgvtVr2NVCG_oWp7nhvk_Eqw=.24bb6b8a-8880-4b9d-b34d-a2c70691f0f4@github.com> On Sat, 25 Jun 2022 01:23:47 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Remove outdated comment" > > This reverts commit 8f571d76e34bc64ceb31894184fba4b909e8fbfe. src/hotspot/share/runtime/sharedRuntime.cpp line 1582: > 1580: // but in interp_only_mode we need to go to the interpreted entry > 1581: // The c2i won't patch in this mode -- see fixup_callers_callsite > 1582: return callee_method->get_c2i_entry(); Nit: Dots at the end of lines 1580-1581 would be nice to follow comments style in this file. src/hotspot/share/runtime/sharedRuntime.cpp line 2018: > 2016: if (JavaThread::current()->is_interp_only_mode()) > 2017: return; > 2018: } Nit - micro simplification: if (nm->method()->is_continuation_enter_intrinsic() && JavaThread::current()->is_interp_only_mode()) { return; } ------------- PR: https://git.openjdk.org/jdk19/pull/66 From sspitsyn at openjdk.org Sat Jul 2 07:40:49 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Jul 2022 07:40:49 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v2] In-Reply-To: References: Message-ID: On Sat, 25 Jun 2022 01:23:47 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Remove outdated comment" > > This reverts commit 8f571d76e34bc64ceb31894184fba4b909e8fbfe. How was this change tested? In fact, it is not easy to estimate the total impact of this change. I hope, it impacts continuations only, but not very sure yet. src/hotspot/share/runtime/continuation.cpp line 315: > 313: thread->set_cont_fastpath_thread_state(fast); > 314: if (thread->is_interp_only_mode() && ContinuationEntry::enter_special() != nullptr) { > 315: ContinuationEntry::enter_special()->clear_continuation_enter_special_inline_caches(); Will this call impact all JavaThread's, not only the one passed in the argument? Just want to understand this better. Would it be worth to add a comment explaining this aspect (if applicable)? ------------- PR: https://git.openjdk.org/jdk19/pull/66 From jwilhelm at openjdk.org Sat Jul 2 11:15:24 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Sat, 2 Jul 2022 11:15:24 GMT Subject: RFR: Merge jdk19 Message-ID: Forwardport JDK 19 -> JDK 20 ------------- Commit messages: - Merge remote-tracking branch 'jdk19/master' into Merge_jdk19 - 8245268: -Xcomp is missing from java launcher documentation - 8288703: GetThreadState returns 0 for virtual thread that has terminated - 8288854: getLocalGraphicsEnvironment() on for multi-screen setups throws exception NPE - 8280320: C2: Loop opts are missing during OSR compilation - 8289570: SegmentAllocator:allocateUtf8String(String str) default behavior mismatch to spec - 8289585: ProblemList sun/tools/jhsdb/JStackStressTest.java on linux-aarch64 - 8289549: ISO 4217 Amendment 172 Update - 8284358: Unreachable loop is not removed from C2 IR, leading to a broken graph The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=jdk&pr=9354&range=00.0 - jdk19: https://webrevs.openjdk.org/?repo=jdk&pr=9354&range=00.1 Changes: https://git.openjdk.org/jdk/pull/9354/files Stats: 382 lines in 14 files changed: 330 ins; 5 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/9354.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9354/head:pull/9354 PR: https://git.openjdk.org/jdk/pull/9354 From iklam at openjdk.org Sat Jul 2 14:47:39 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 2 Jul 2022 14:47:39 GMT Subject: RFR: 8289230: Move PlatformXXX class declarations out of os_xxx.hpp [v2] In-Reply-To: References: <0v7-TE5YMMz_zYMiuxdpTNpFcfCWqX-eG9l5R0uSvHk=.7b09cfd0-c85a-48bb-bf54-581496e2d996@github.com> Message-ID: On Tue, 28 Jun 2022 19:39:29 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed comments > > I very much like this incremental approach to reducing our over-inclusion. Thanks @coleenp @calvinccheung @dholmes-ora for the review. ------------- PR: https://git.openjdk.org/jdk/pull/9303 From iklam at openjdk.org Sat Jul 2 14:47:40 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 2 Jul 2022 14:47:40 GMT Subject: Integrated: 8289230: Move PlatformXXX class declarations out of os_xxx.hpp In-Reply-To: <0v7-TE5YMMz_zYMiuxdpTNpFcfCWqX-eG9l5R0uSvHk=.7b09cfd0-c85a-48bb-bf54-581496e2d996@github.com> References: <0v7-TE5YMMz_zYMiuxdpTNpFcfCWqX-eG9l5R0uSvHk=.7b09cfd0-c85a-48bb-bf54-581496e2d996@github.com> Message-ID: On Tue, 28 Jun 2022 06:16:21 GMT, Ioi Lam wrote: > There are only two implementations of these classes (one for windows, and one for posix): > > - PlatformEvent > - PlatformParker > - PlatformMutex > - PlatformMonitor > - ThreadCrashProtection > > Before this PR, these classes are declared in os_xxx.hpp. This causes excessive inclusion of the large header file os.hpp by popular headers such as mutex.hpp, which needs only the declaration of PlatformMutex but not the other stuff in os.hpp > > This PR moves the declarations to park_posix.hpp, mutex_posix.hpp, etc. > > Note: ideally, the definition of PlatformParker/PlatformEvent should be moved to park_posix.cpp, and PlatformMutex/PlatformMonitor should be moved to mutex_posix.cpp. However, the definition of these 4 classes are intertwined, so I'll leave them inside os_posix.cpp for now. (Same for the Windows version). This pull request has now been integrated. Changeset: cdf69792 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/cdf697925953f62e17a7916ba611d7e789f09edf Stats: 1172 lines in 36 files changed: 703 ins; 390 del; 79 mod 8289230: Move PlatformXXX class declarations out of os_xxx.hpp Reviewed-by: coleenp, ccheung ------------- PR: https://git.openjdk.org/jdk/pull/9303 From kim.barrett at oracle.com Sat Jul 2 15:21:55 2022 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 2 Jul 2022 15:21:55 +0000 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> Message-ID: > On Jun 30, 2022, at 1:21 AM, David Holmes wrote: > > Hi Thomas, > > On 30/06/2022 2:57 pm, Thomas St?fe wrote: >> Hi, >> several functions in the os:: name scope are deliberately named like the official counterparts they replace: >> os::malloc, os::free, os::strdup, os::realloc, os::recv, os::send, os::connect, os::signal... >> There may be more. Some of them argument-match their counterparts (e.g. os::free), while others don't. >> Since the os:: variants can be called inside the os:: namespace with omitting the leading os::, name confusions are possible. "free(p)" means something different in global scope or inside an os:: function. >> This can lead to problems that are difficult to find, e.g., mismatched (os::)malloc->(os::)free with the potential to corrupt the C-heap: >> [?] So I wonder if we should do that. Rename os:: to something like os::. And what the prefix or suffix would be. > > It annoys me that we have to do such things. It would have made more sense for the standard C library routines to have a prefix that marked them as reserved identifiers rather than polluting the global namespace the way they did. But no one thinks of these things initially and by the time it is standardised it is too late to make such changes. :( > > I'm not sure this is a problem we have to address, but if we choose to then I think we should try to make a general improvement to the way os is used. > > Maybe, as I think has been suggested before, we can move these out of the os class as they are not really about the os but the C library, and then any renaming that includes a prefix may not look so bad? > > Maybe lib::C_free(), lib::C_malloc() etc? A reminder that JDK-8214976 allows us to "poison" a function, other than in explicitly marked places. When I introduced that feature I only marked a small number of functions that were "easy". There are many other functions that seem like good candidates, but had more fannout than I wanted in that change. For example, we could mark ::malloc, ::calloc, ::free, &etc as normally forbidden. From thomas.stuefe at gmail.com Sat Jul 2 16:57:56 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 2 Jul 2022 18:57:56 +0200 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> Message-ID: On Sat, Jul 2, 2022 at 5:22 PM Kim Barrett wrote: > > On Jun 30, 2022, at 1:21 AM, David Holmes > wrote: > > > > Hi Thomas, > > > > On 30/06/2022 2:57 pm, Thomas St?fe wrote: > >> Hi, > >> several functions in the os:: name scope are deliberately named like > the official counterparts they replace: > >> os::malloc, os::free, os::strdup, os::realloc, os::recv, os::send, > os::connect, os::signal... > >> There may be more. Some of them argument-match their counterparts (e.g. > os::free), while others don't. > >> Since the os:: variants can be called inside the os:: namespace with > omitting the leading os::, name confusions are possible. "free(p)" means > something different in global scope or inside an os:: function. > >> This can lead to problems that are difficult to find, e.g., mismatched > (os::)malloc->(os::)free with the potential to corrupt the C-heap: > >> [?] So I wonder if we should do that. Rename os:: to > something like os::. And what the prefix or suffix would > be. > > > > It annoys me that we have to do such things. It would have made more > sense for the standard C library routines to have a prefix that marked them > as reserved identifiers rather than polluting the global namespace the way > they did. But no one thinks of these things initially and by the time it is > standardised it is too late to make such changes. :( > > > > I'm not sure this is a problem we have to address, but if we choose to > then I think we should try to make a general improvement to the way os is > used. > > > > Maybe, as I think has been suggested before, we can move these out of > the os class as they are not really about the os but the C library, and > then any renaming that includes a prefix may not look so bad? > > > > Maybe lib::C_free(), lib::C_malloc() etc? > > A reminder that JDK-8214976 allows us to "poison" a function, other than in > explicitly marked places. When I introduced that feature I only marked a > small > number of functions that were "easy". There are many other functions that > seem > like good candidates, but had more fannout than I wanted in that change. > For > example, we could mark ::malloc, ::calloc, ::free, &etc as normally > forbidden. > > I really like this, in addition to the name change. Note however that we may still need to expose the raw functions to hotspot code for outlier cases. Something like "os::raw_malloc()" and "os::raw_free()" for "when you really really mean it". Incidentally, in our proprietary VM we have a propietary C-heap tracing, invented before NMT existed. That one covers the whole JDK, not only the hotspot, and didi not use malloc headers. When we did this, we had a fun time hunting down all the system APIs the JDK uses that return C-heap and thus require the caller to raw-free() it. I remember being surprised at the number. This was across the whole JDK though, which uses more system APIs than hotspot. Still, with that in mind, we may need at least an os::raw_free(). -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwilhelm at openjdk.org Sat Jul 2 18:13:28 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Sat, 2 Jul 2022 18:13:28 GMT Subject: RFR: Merge jdk19 [v2] In-Reply-To: References: Message-ID: <8GdyOQFu7pc4QNML59F_8hw0Wfy7v2N0-acbUtFc4x0=.b42ce0cb-009d-41d4-9a73-14765ae033a7@github.com> > Forwardport JDK 19 -> JDK 20 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 199 commits: - Merge remote-tracking branch 'jdk19/master' into Merge_jdk19 - 8289603: Code change for JDK-8170762 breaks all build Reviewed-by: weijun - 8170762: Document that ISO10126Padding pads with random bytes Reviewed-by: weijun - 8289584: (fs) Print size values in java/nio/file/FileStore/Basic.java when they differ by > 1GiB Reviewed-by: alanb - 8289257: Some custom loader tests failed due to symbol refcount not decremented Reviewed-by: iklam, coleenp - 8289534: Change 'uncomplicated' hotspot runtime options Reviewed-by: coleenp, dholmes - 8289512: Fix GCC 12 warnings for adlc output_c.cpp Reviewed-by: kvn, lucy - 8277060: EXCEPTION_INT_DIVIDE_BY_ZERO in TypeAryPtr::dump2 with -XX:+TracePhaseCCP Reviewed-by: kvn, thartmann, chagedorn, dlong - 8288444: Remove the workaround for frame.pack() in ModalDialogTest Reviewed-by: azvegint - 8289434: x86_64: Improve comment on gen_continuation_enter() Reviewed-by: kvn - ... and 189 more: https://git.openjdk.org/jdk/compare/f5cdabad...20b15114 ------------- Changes: https://git.openjdk.org/jdk/pull/9354/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9354&range=01 Stats: 79992 lines in 1361 files changed: 50048 ins; 16523 del; 13421 mod Patch: https://git.openjdk.org/jdk/pull/9354.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9354/head:pull/9354 PR: https://git.openjdk.org/jdk/pull/9354 From jwilhelm at openjdk.org Sat Jul 2 18:13:29 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Sat, 2 Jul 2022 18:13:29 GMT Subject: Integrated: Merge jdk19 In-Reply-To: References: Message-ID: On Sat, 2 Jul 2022 11:03:38 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 19 -> JDK 20 This pull request has now been integrated. Changeset: 70f56933 Author: Jesper Wilhelmsson URL: https://git.openjdk.org/jdk/commit/70f5693356277c0685668219a79819707d099d9f Stats: 382 lines in 14 files changed: 330 ins; 5 del; 47 mod Merge ------------- PR: https://git.openjdk.org/jdk/pull/9354 From duke at openjdk.org Sat Jul 2 18:42:04 2022 From: duke at openjdk.org (kristylee88) Date: Sat, 2 Jul 2022 18:42:04 GMT Subject: RFR: Merge jdk19 [v2] In-Reply-To: <8GdyOQFu7pc4QNML59F_8hw0Wfy7v2N0-acbUtFc4x0=.b42ce0cb-009d-41d4-9a73-14765ae033a7@github.com> References: <8GdyOQFu7pc4QNML59F_8hw0Wfy7v2N0-acbUtFc4x0=.b42ce0cb-009d-41d4-9a73-14765ae033a7@github.com> Message-ID: On Sat, 2 Jul 2022 18:13:28 GMT, Jesper Wilhelmsson wrote: >> Forwardport JDK 19 -> JDK 20 > > Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 199 commits: > > - Merge remote-tracking branch 'jdk19/master' into Merge_jdk19 > - 8289603: Code change for JDK-8170762 breaks all build > > Reviewed-by: weijun > - 8170762: Document that ISO10126Padding pads with random bytes > > Reviewed-by: weijun > - 8289584: (fs) Print size values in java/nio/file/FileStore/Basic.java when they differ by > 1GiB > > Reviewed-by: alanb > - 8289257: Some custom loader tests failed due to symbol refcount not decremented > > Reviewed-by: iklam, coleenp > - 8289534: Change 'uncomplicated' hotspot runtime options > > Reviewed-by: coleenp, dholmes > - 8289512: Fix GCC 12 warnings for adlc output_c.cpp > > Reviewed-by: kvn, lucy > - 8277060: EXCEPTION_INT_DIVIDE_BY_ZERO in TypeAryPtr::dump2 with -XX:+TracePhaseCCP > > Reviewed-by: kvn, thartmann, chagedorn, dlong > - 8288444: Remove the workaround for frame.pack() in ModalDialogTest > > Reviewed-by: azvegint > - 8289434: x86_64: Improve comment on gen_continuation_enter() > > Reviewed-by: kvn > - ... and 189 more: https://git.openjdk.org/jdk/compare/f5cdabad...20b15114 Marked as reviewed by kristylee88 at github.com (no known OpenJDK username). Marked as reviewed by kristylee88 at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.org/jdk/pull/9354 From kim.barrett at oracle.com Sat Jul 2 18:47:31 2022 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 2 Jul 2022 18:47:31 +0000 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> Message-ID: > On Jul 2, 2022, at 12:57 PM, Thomas St?fe wrote: > > On Sat, Jul 2, 2022 at 5:22 PM Kim Barrett wrote: > > On Jun 30, 2022, at 1:21 AM, David Holmes wrote: > > > > Hi Thomas, > > > > On 30/06/2022 2:57 pm, Thomas St?fe wrote: > >> Hi, > >> several functions in the os:: name scope are deliberately named like the official counterparts they replace: > >> os::malloc, os::free, os::strdup, os::realloc, os::recv, os::send, os::connect, os::signal... > >> There may be more. Some of them argument-match their counterparts (e.g. os::free), while others don't. > >> Since the os:: variants can be called inside the os:: namespace with omitting the leading os::, name confusions are possible. "free(p)" means something different in global scope or inside an os:: function. > >> This can lead to problems that are difficult to find, e.g., mismatched (os::)malloc->(os::)free with the potential to corrupt the C-heap: > >> [?] So I wonder if we should do that. Rename os:: to something like os::. And what the prefix or suffix would be. > > > > It annoys me that we have to do such things. It would have made more sense for the standard C library routines to have a prefix that marked them as reserved identifiers rather than polluting the global namespace the way they did. But no one thinks of these things initially and by the time it is standardised it is too late to make such changes. :( > > > > I'm not sure this is a problem we have to address, but if we choose to then I think we should try to make a general improvement to the way os is used. > > > > Maybe, as I think has been suggested before, we can move these out of the os class as they are not really about the os but the C library, and then any renaming that includes a prefix may not look so bad? > > > > Maybe lib::C_free(), lib::C_malloc() etc? > > A reminder that JDK-8214976 allows us to "poison" a function, other than in > explicitly marked places. When I introduced that feature I only marked a small > number of functions that were "easy". There are many other functions that seem > like good candidates, but had more fannout than I wanted in that change. For > example, we could mark ::malloc, ::calloc, ::free, &etc as normally forbidden. > > > I really like this, in addition to the name change. Note however that we may still need to expose the raw functions to hotspot code for outlier cases. Something like "os::raw_malloc()" and "os::raw_free()" for "when you really really mean it". Remember that JDK-8214976 provides a way to disable the poisoning in a specific context. So you can still use ?::malloc()? (for example) where you need it; you just need to also say that?s really what you meant. From thomas.stuefe at gmail.com Sat Jul 2 20:03:29 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 2 Jul 2022 22:03:29 +0200 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> Message-ID: On Sat, Jul 2, 2022 at 8:48 PM Kim Barrett wrote: > > On Jul 2, 2022, at 12:57 PM, Thomas St?fe > wrote: > > > > On Sat, Jul 2, 2022 at 5:22 PM Kim Barrett > wrote: > > > On Jun 30, 2022, at 1:21 AM, David Holmes > wrote: > > > > > > Hi Thomas, > > > > > > On 30/06/2022 2:57 pm, Thomas St?fe wrote: > > >> Hi, > > >> several functions in the os:: name scope are deliberately named like > the official counterparts they replace: > > >> os::malloc, os::free, os::strdup, os::realloc, os::recv, os::send, > os::connect, os::signal... > > >> There may be more. Some of them argument-match their counterparts > (e.g. os::free), while others don't. > > >> Since the os:: variants can be called inside the os:: namespace with > omitting the leading os::, name confusions are possible. "free(p)" means > something different in global scope or inside an os:: function. > > >> This can lead to problems that are difficult to find, e.g., > mismatched (os::)malloc->(os::)free with the potential to corrupt the > C-heap: > > >> [?] So I wonder if we should do that. Rename os:: to > something like os::. And what the prefix or suffix would > be. > > > > > > It annoys me that we have to do such things. It would have made more > sense for the standard C library routines to have a prefix that marked them > as reserved identifiers rather than polluting the global namespace the way > they did. But no one thinks of these things initially and by the time it is > standardised it is too late to make such changes. :( > > > > > > I'm not sure this is a problem we have to address, but if we choose to > then I think we should try to make a general improvement to the way os is > used. > > > > > > Maybe, as I think has been suggested before, we can move these out of > the os class as they are not really about the os but the C library, and > then any renaming that includes a prefix may not look so bad? > > > > > > Maybe lib::C_free(), lib::C_malloc() etc? > > > > A reminder that JDK-8214976 allows us to "poison" a function, other than > in > > explicitly marked places. When I introduced that feature I only marked a > small > > number of functions that were "easy". There are many other functions > that seem > > like good candidates, but had more fannout than I wanted in that change. > For > > example, we could mark ::malloc, ::calloc, ::free, &etc as normally > forbidden. > > > > > > I really like this, in addition to the name change. Note however that we > may still need to expose the raw functions to hotspot code for outlier > cases. Something like "os::raw_malloc()" and "os::raw_free()" for "when you > really really mean it". > > Remember that JDK-8214976 provides a way to disable the poisoning in a > specific context. > So you can still use ?::malloc()? (for example) where you need it; you > just need to also say > that?s really what you meant. > > What I originally meant was to provide an os::raw_malloc() that internally disables the poisoning, then calls ::malloc, but without NMT headers or anything. The "raw" in the name would indicate the intent. But now I think that's too complex. I only can think of two places that need raw ::malloc - NMT pre-initialization stuff, and os::malloc itself - and they can just disable poisoning inside and add a comment. So a general purpose os::raw_malloc() would not be needed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Sun Jul 3 00:59:46 2022 From: duke at openjdk.org (kristylee88) Date: Sun, 3 Jul 2022 00:59:46 GMT Subject: RFR: 8289230: Move PlatformXXX class declarations out of os_xxx.hpp [v4] In-Reply-To: References: <0v7-TE5YMMz_zYMiuxdpTNpFcfCWqX-eG9l5R0uSvHk=.7b09cfd0-c85a-48bb-bf54-581496e2d996@github.com> Message-ID: On Sat, 2 Jul 2022 04:26:48 GMT, Ioi Lam wrote: >> There are only two implementations of these classes (one for windows, and one for posix): >> >> - PlatformEvent >> - PlatformParker >> - PlatformMutex >> - PlatformMonitor >> - ThreadCrashProtection >> >> Before this PR, these classes are declared in os_xxx.hpp. This causes excessive inclusion of the large header file os.hpp by popular headers such as mutex.hpp, which needs only the declaration of PlatformMutex but not the other stuff in os.hpp >> >> This PR moves the declarations to park_posix.hpp, mutex_posix.hpp, etc. >> >> Note: ideally, the definition of PlatformParker/PlatformEvent should be moved to park_posix.cpp, and PlatformMutex/PlatformMonitor should be moved to mutex_posix.cpp. However, the definition of these 4 classes are intertwined, so I'll leave them inside os_posix.cpp for now. (Same for the Windows version). > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - removed unrelated newline change > - Merge branch 'master' into 8289230-move-Platform-classes-out-of-os-xxx-hpp > - @coleenp comments > - fixed comments > - fixed windows > - Moved PlatformMutex/PlatformMonitor > - move-PlatformParker-out-of-os-xxx-hpp > - Moved ThreadCrashProtection Marked as reviewed by kristylee88 at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.org/jdk/pull/9303 From duke at openjdk.org Sun Jul 3 01:28:44 2022 From: duke at openjdk.org (kristylee88) Date: Sun, 3 Jul 2022 01:28:44 GMT Subject: RFR: 8289230: Move PlatformXXX class declarations out of os_xxx.hpp [v4] In-Reply-To: References: <0v7-TE5YMMz_zYMiuxdpTNpFcfCWqX-eG9l5R0uSvHk=.7b09cfd0-c85a-48bb-bf54-581496e2d996@github.com> Message-ID: <0RDnszyg_IHLT64-8cM28uYDlMl1iJiUrnvUAzYtRu0=.2616eaaf-3baf-4c31-9b95-a181598046ec@github.com> On Sat, 2 Jul 2022 04:26:48 GMT, Ioi Lam wrote: >> There are only two implementations of these classes (one for windows, and one for posix): >> >> - PlatformEvent >> - PlatformParker >> - PlatformMutex >> - PlatformMonitor >> - ThreadCrashProtection >> >> Before this PR, these classes are declared in os_xxx.hpp. This causes excessive inclusion of the large header file os.hpp by popular headers such as mutex.hpp, which needs only the declaration of PlatformMutex but not the other stuff in os.hpp >> >> This PR moves the declarations to park_posix.hpp, mutex_posix.hpp, etc. >> >> Note: ideally, the definition of PlatformParker/PlatformEvent should be moved to park_posix.cpp, and PlatformMutex/PlatformMonitor should be moved to mutex_posix.cpp. However, the definition of these 4 classes are intertwined, so I'll leave them inside os_posix.cpp for now. (Same for the Windows version). > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - removed unrelated newline change > - Merge branch 'master' into 8289230-move-Platform-classes-out-of-os-xxx-hpp > - @coleenp comments > - fixed comments > - fixed windows > - Moved PlatformMutex/PlatformMonitor > - move-PlatformParker-out-of-os-xxx-hpp > - Moved ThreadCrashProtection Marked as reviewed by kristylee88 at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.org/jdk/pull/9303 From thomas.stuefe at gmail.com Sun Jul 3 08:47:28 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sun, 3 Jul 2022 10:47:28 +0200 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> Message-ID: I am preparing a patch to forbid C-heap allocation functions in hotspot as you proposed (https://github.com/openjdk/jdk/pull/9356). Interestingly, not all occurrences of forbidden functions are found everywhere. I found that if I compile on Ubuntu 20.04 with gcc 10.3., it does not complain about "realpath" even though I forbade it. If I build on Alpine, gcc 10.3.1, it finds occurrences of realpath. This may have to do with the way realpath is defined: glibc: extern char *realpath (const char *__restrict __name, char *__restrict __resolved) __THROW __wur; (__THROW becomes throw()) muslc: char *realpath (const char *__restrict, char *__restrict); For a test, I added "throw()" to the realpath prototype in "FORBID_C_FUNCTION", but that did not help either. gcc just did not pick up my use of raw realpath. On Sat, Jul 2, 2022 at 8:48 PM Kim Barrett wrote: > > On Jul 2, 2022, at 12:57 PM, Thomas St?fe > wrote: > > > > On Sat, Jul 2, 2022 at 5:22 PM Kim Barrett > wrote: > > > On Jun 30, 2022, at 1:21 AM, David Holmes > wrote: > > > > > > Hi Thomas, > > > > > > On 30/06/2022 2:57 pm, Thomas St?fe wrote: > > >> Hi, > > >> several functions in the os:: name scope are deliberately named like > the official counterparts they replace: > > >> os::malloc, os::free, os::strdup, os::realloc, os::recv, os::send, > os::connect, os::signal... > > >> There may be more. Some of them argument-match their counterparts > (e.g. os::free), while others don't. > > >> Since the os:: variants can be called inside the os:: namespace with > omitting the leading os::, name confusions are possible. "free(p)" means > something different in global scope or inside an os:: function. > > >> This can lead to problems that are difficult to find, e.g., > mismatched (os::)malloc->(os::)free with the potential to corrupt the > C-heap: > > >> [?] So I wonder if we should do that. Rename os:: to > something like os::. And what the prefix or suffix would > be. > > > > > > It annoys me that we have to do such things. It would have made more > sense for the standard C library routines to have a prefix that marked them > as reserved identifiers rather than polluting the global namespace the way > they did. But no one thinks of these things initially and by the time it is > standardised it is too late to make such changes. :( > > > > > > I'm not sure this is a problem we have to address, but if we choose to > then I think we should try to make a general improvement to the way os is > used. > > > > > > Maybe, as I think has been suggested before, we can move these out of > the os class as they are not really about the os but the C library, and > then any renaming that includes a prefix may not look so bad? > > > > > > Maybe lib::C_free(), lib::C_malloc() etc? > > > > A reminder that JDK-8214976 allows us to "poison" a function, other than > in > > explicitly marked places. When I introduced that feature I only marked a > small > > number of functions that were "easy". There are many other functions > that seem > > like good candidates, but had more fannout than I wanted in that change. > For > > example, we could mark ::malloc, ::calloc, ::free, &etc as normally > forbidden. > > > > > > I really like this, in addition to the name change. Note however that we > may still need to expose the raw functions to hotspot code for outlier > cases. Something like "os::raw_malloc()" and "os::raw_free()" for "when you > really really mean it". > > Remember that JDK-8214976 provides a way to disable the poisoning in a > specific context. > So you can still use ?::malloc()? (for example) where you need it; you > just need to also say > that?s really what you meant. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuefe at openjdk.org Sun Jul 3 10:45:24 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 3 Jul 2022 10:45:24 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings Message-ID: [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. Places that allow raw C functions: - decoder on Linux, since the C++ demangler returns raw C heap - realpath, in conjunction with allowing real free for the returned buffer - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities - obviously os::malloc and friends - NMT pre-initialization code because circularities - In gtest main function - I think gtest should work always, even if os::malloc is broken. Places I fixed: - ZGC, mountpoint string handling - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this - A couple of places in gtests. Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. ---- Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. GHAs are in work. [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html ------------- Commit messages: - explicitly add globalDefinitions to os_linux.cpp - Fix ppcle - JDK-8289633-Forbid-raw-C-heap-allocation-functions-in-hotspot-and-fix-findings Changes: https://git.openjdk.org/jdk/pull/9356/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9356&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289633 Stats: 81 lines in 20 files changed: 38 ins; 1 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/9356.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9356/head:pull/9356 PR: https://git.openjdk.org/jdk/pull/9356 From kbarrett at openjdk.org Sun Jul 3 13:17:42 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 3 Jul 2022 13:17:42 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings In-Reply-To: References: Message-ID: On Sun, 3 Jul 2022 08:04:09 GMT, Thomas Stuefe wrote: > [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. > > We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. > > I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. > > Places that allow raw C functions: > - decoder on Linux, since the C++ demangler returns raw C heap > - realpath, in conjunction with allowing real free for the returned buffer > - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point > - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities > - obviously os::malloc and friends > - NMT pre-initialization code because circularities > - In gtest main function - I think gtest should work always, even if os::malloc is broken. > > Places I fixed: > - ZGC, mountpoint string handling > - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay > - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter > - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this > - A couple of places in gtests. > > Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. > > ---- > > Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. > > GHAs are in work. > > > [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html A few small nits, but generally good. One stylistic question is the naming of functions in ALLOW_xxx macro uses. Sometimes they are explicitly in the global namespace (`::foo`), sometimes not, and sometimes the macro's name argument and the use(s) in the permissive code are different. At least the latter probably should be made consistent. Regularizing others might be worthwhile, in which case I suggest explicit global namespace qualification. Not sure how strongly I feel about that. src/hotspot/cpu/ppc/macroAssembler_ppc_sha.cpp line 28: > 26: #ifdef AIX > 27: #include "runtime/os.hpp" // malloc > 28: #endif Is it actually important to make the inclusion conditional? Also, the Style Guide says conditional includes go at the end. src/hotspot/share/runtime/os.cpp line 739: > 737: void* const old_outer_ptr = MemTracker::record_free(memblock); > 738: > 739: ALLOW_C_FUNCTION(::realloc, ::free(old_outer_ptr);) s/realloc/free/ - unfortunately, the existing macro implementations only use the name for "documentation" purposes. src/hotspot/share/utilities/globalDefinitions.hpp line 178: > 176: FORBID_C_FUNCTION(void free(void *ptr), "use os::free"); > 177: FORBID_C_FUNCTION(void* realloc(void *ptr, size_t size), "use os::realloc"); > 178: FORBID_C_FUNCTION(char* strdup(const char *s), "use os::realloc"); s/realloc/strdup/ src/hotspot/share/utilities/globalDefinitions.hpp line 179: > 177: FORBID_C_FUNCTION(void* realloc(void *ptr, size_t size), "use os::realloc"); > 178: FORBID_C_FUNCTION(char* strdup(const char *s), "use os::realloc"); > 179: FORBID_C_FUNCTION(char* strndup(const char *s, size_t n), "use os::strdup"); I take it there are no calls to `::strndup`? If there were, we should probably add `os::strndup`. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/9356 From kim.barrett at oracle.com Sun Jul 3 20:59:22 2022 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 3 Jul 2022 20:59:22 +0000 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> Message-ID: <9A315AE7-A1F0-4837-A2C7-56A88C526CCB@oracle.com> > On Jul 3, 2022, at 4:47 AM, Thomas St?fe wrote: > > I am preparing a patch to forbid C-heap allocation functions in hotspot as you proposed (https://github.com/openjdk/jdk/pull/9356). > > Interestingly, not all occurrences of forbidden functions are found everywhere. I found that if I compile on Ubuntu 20.04 with gcc 10.3., it does not complain about "realpath" even though I forbade it. If I build on Alpine, gcc 10.3.1, it finds occurrences of realpath. In which build variants? All? Or only fastdebug? If the latter, this might be another case of _FORTIFY_SOURCE rewriting the call first, dodging the warning. This is mentioned in the comment describing the gcc implementation of FORBID_C_FUNCTION. Note also that I didn?t find a way to provide this feature for Windows/VisualStudio, and the clang implementation didn?t get tested by me. (Oracle only uses clang when building for MacOS using Xcode, and no released version of Xcode has a sufficiently recent version of clang to have the needed feature set.) From stuefe at openjdk.org Mon Jul 4 07:11:42 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Jul 2022 07:11:42 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings In-Reply-To: References: Message-ID: On Sun, 3 Jul 2022 13:04:53 GMT, Kim Barrett wrote: >> [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. >> >> We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. >> >> I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. >> >> Places that allow raw C functions: >> - decoder on Linux, since the C++ demangler returns raw C heap >> - realpath, in conjunction with allowing real free for the returned buffer >> - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point >> - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities >> - obviously os::malloc and friends >> - NMT pre-initialization code because circularities >> - In gtest main function - I think gtest should work always, even if os::malloc is broken. >> >> Places I fixed: >> - ZGC, mountpoint string handling >> - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay >> - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter >> - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this >> - A couple of places in gtests. >> >> Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. >> >> ---- >> >> Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. >> >> GHAs are in work. >> >> >> [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html > > src/hotspot/share/utilities/globalDefinitions.hpp line 179: > >> 177: FORBID_C_FUNCTION(void* realloc(void *ptr, size_t size), "use os::realloc"); >> 178: FORBID_C_FUNCTION(char* strdup(const char *s), "use os::realloc"); >> 179: FORBID_C_FUNCTION(char* strndup(const char *s, size_t n), "use os::strdup"); > > I take it there are no calls to `::strndup`? If there were, we should probably add `os::strndup`. There is one case, in share/compiler/directivesParser.cpp, but commented out. ------------- PR: https://git.openjdk.org/jdk/pull/9356 From thomas.stuefe at gmail.com Mon Jul 4 07:18:37 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Jul 2022 09:18:37 +0200 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: <9A315AE7-A1F0-4837-A2C7-56A88C526CCB@oracle.com> References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> <9A315AE7-A1F0-4837-A2C7-56A88C526CCB@oracle.com> Message-ID: On Sun, Jul 3, 2022 at 10:59 PM Kim Barrett wrote: > > On Jul 3, 2022, at 4:47 AM, Thomas St?fe > wrote: > > > > I am preparing a patch to forbid C-heap allocation functions in hotspot > as you proposed (https://github.com/openjdk/jdk/pull/9356). > > > > Interestingly, not all occurrences of forbidden functions are found > everywhere. I found that if I compile on Ubuntu 20.04 with gcc 10.3., it > does not complain about "realpath" even though I forbade it. If I build on > Alpine, gcc 10.3.1, it finds occurrences of realpath. > > In which build variants? All? Or only fastdebug? If the latter, this > might be another case of > _FORTIFY_SOURCE rewriting the call first, dodging the warning. This is > mentioned in the > comment describing the gcc implementation of FORBID_C_FUNCTION. > > No, it fails also on release to recognize realpath. Just to be sure I tested the most important other candidates (malloc, free, realloc, calloc, strdup) and those all work. > Note also that I didn?t find a way to provide this feature for > Windows/VisualStudio, and the > clang implementation didn?t get tested by me. (Oracle only uses clang > when building for > MacOS using Xcode, and no released version of Xcode has a sufficiently > recent version > of clang to have the needed feature set.) > > One thing that I dislike is that this requires including globalDefinitions.hpp. Not every cpp or hpp file may include that. Also, since in theory, to have 100% coverage, every file and header should include globalDefinitions.hpp, we may want to move these macros into an own file to avoid blowing up the dependencies. But other than that, I like this mechanism. Its very practical. Cheers, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuefe at openjdk.org Mon Jul 4 07:26:33 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Jul 2022 07:26:33 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v2] In-Reply-To: References: Message-ID: > [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. > > We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. > > I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. > > Places that allow raw C functions: > - decoder on Linux, since the C++ demangler returns raw C heap > - realpath, in conjunction with allowing real free for the returned buffer > - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point > - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities > - obviously os::malloc and friends > - NMT pre-initialization code because circularities > - In gtest main function - I think gtest should work always, even if os::malloc is broken. > > Places I fixed: > - ZGC, mountpoint string handling > - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay > - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter > - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this > - A couple of places in gtests. > > Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. > > ---- > > Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. > > GHAs are in work. > > > [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - Forgot one.. - Review feedback Kim ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9356/files - new: https://git.openjdk.org/jdk/pull/9356/files/06274529..b93ca75f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9356&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9356&range=00-01 Stats: 11 lines in 4 files changed: 0 ins; 2 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/9356.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9356/head:pull/9356 PR: https://git.openjdk.org/jdk/pull/9356 From stuefe at openjdk.org Mon Jul 4 07:26:38 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Jul 2022 07:26:38 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v2] In-Reply-To: References: Message-ID: On Sun, 3 Jul 2022 13:08:24 GMT, Kim Barrett wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - Forgot one.. >> - Review feedback Kim > > src/hotspot/cpu/ppc/macroAssembler_ppc_sha.cpp line 28: > >> 26: #ifdef AIX >> 27: #include "runtime/os.hpp" // malloc >> 28: #endif > > Is it actually important to make the inclusion conditional? Also, the Style Guide says conditional includes go at the end. No, its not. I removed it. > src/hotspot/share/runtime/os.cpp line 739: > >> 737: void* const old_outer_ptr = MemTracker::record_free(memblock); >> 738: >> 739: ALLOW_C_FUNCTION(::realloc, ::free(old_outer_ptr);) > > s/realloc/free/ - unfortunately, the existing macro implementations only use the name for "documentation" purposes. Fixed > src/hotspot/share/utilities/globalDefinitions.hpp line 178: > >> 176: FORBID_C_FUNCTION(void free(void *ptr), "use os::free"); >> 177: FORBID_C_FUNCTION(void* realloc(void *ptr, size_t size), "use os::realloc"); >> 178: FORBID_C_FUNCTION(char* strdup(const char *s), "use os::realloc"); > > s/realloc/strdup/ Right ------------- PR: https://git.openjdk.org/jdk/pull/9356 From stuefe at openjdk.org Mon Jul 4 07:29:42 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Jul 2022 07:29:42 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v2] In-Reply-To: References: Message-ID: On Sun, 3 Jul 2022 13:13:51 GMT, Kim Barrett wrote: > A few small nits, but generally good. > > One stylistic question is the naming of functions in ALLOW_xxx macro uses. Sometimes they are explicitly in the global namespace (`::foo`), sometimes not, and sometimes the macro's name argument and the use(s) in the permissive code are different. At least the latter probably should be made consistent. Regularizing others might be worthwhile, in which case I suggest explicit global namespace qualification. Not sure how strongly I feel about that. Thank you @kimbarrett for the review. I fixed those places you found. Seems I rushed the patch a bit. I unified the names in the ALLOW_ macros to all use global namespace scope. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/9356 From kim.barrett at oracle.com Mon Jul 4 07:55:46 2022 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 4 Jul 2022 07:55:46 +0000 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> <9A315AE7-A1F0-4837-A2C7-56A88C526CCB@oracle.com> Message-ID: > On Jul 4, 2022, at 3:18 AM, Thomas St?fe wrote: > On Sun, Jul 3, 2022 at 10:59 PM Kim Barrett wrote: > > On Jul 3, 2022, at 4:47 AM, Thomas St?fe wrote: > > > > I am preparing a patch to forbid C-heap allocation functions in hotspot as you proposed (https://github.com/openjdk/jdk/pull/9356). > > > > Interestingly, not all occurrences of forbidden functions are found everywhere. I found that if I compile on Ubuntu 20.04 with gcc 10.3., it does not complain about "realpath" even though I forbade it. If I build on Alpine, gcc 10.3.1, it finds occurrences of realpath. > > In which build variants? All? Or only fastdebug? If the latter, this might be another case of > _FORTIFY_SOURCE rewriting the call first, dodging the warning. This is mentioned in the > comment describing the gcc implementation of FORBID_C_FUNCTION. > > > No, it fails also on release to recognize realpath. Just to be sure I tested the most important other candidates (malloc, free, realloc, calloc, strdup) and those all work. Strange. Maybe there?s some other similar rewrite going on? I?ll try poking at this. > One thing that I dislike is that this requires including globalDefinitions.hpp. Not every cpp or hpp file may include that. Also, since in theory, to have 100% coverage, every file and header should include globalDefinitions.hpp, we may want to move these macros into an own file to avoid blowing up the dependencies. I thought about putting the forbiddings in a different file, but ultimately decided not. 1. globalDefinitions.hpp ends up being included nearly everywhere anyway, almost certainly in places where these functions would be used. It?s currently such a dumping ground for miscellaneous and often unrelated things that it ends up being hard to avoid. 2. globalDefinitions.hpp is where we do whatever conditionalization is needed to #include a lot of the relevant ?system? and C library headers and perform some additional massaging of them. These poisonings may involve parameter and return types from those headers. From kim.barrett at oracle.com Mon Jul 4 08:03:52 2022 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 4 Jul 2022 08:03:52 +0000 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> <9A315AE7-A1F0-4837-A2C7-56A88C526CCB@oracle.com> Message-ID: <0846DF3A-12D2-4EE9-BB2D-4378FA5B4527@oracle.com> > On Jul 4, 2022, at 3:55 AM, Kim Barrett wrote: > >> On Jul 4, 2022, at 3:18 AM, Thomas St?fe wrote:One thing that I dislike is that this requires including globalDefinitions.hpp. Not every cpp or hpp file may include that. Also, since in theory, to have 100% coverage, every file and header should include globalDefinitions.hpp, we may want to move these macros into an own file to avoid blowing up the dependencies. > > I thought about putting the forbiddings in a different file, but ultimately decided not. > > 1. globalDefinitions.hpp ends up being included nearly everywhere anyway, almost certainly in places > where these functions would be used. It?s currently such a dumping ground for miscellaneous and > often unrelated things that it ends up being hard to avoid. > > 2. globalDefinitions.hpp is where we do whatever conditionalization is needed to #include a lot of > the relevant ?system? and C library headers and perform some additional massaging of them. > These poisonings may involve parameter and return types from those headers. One option is to put it in a separate file and use gcc's `-include` option to ensure it is included everywhere. But (2) is still an issue, so other refactoring of globalDefinitions.hpp would also be needed. (Not that such refactoring would be a bad thing, IMO.) From thomas.stuefe at gmail.com Mon Jul 4 08:24:57 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Jul 2022 10:24:57 +0200 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: <0846DF3A-12D2-4EE9-BB2D-4378FA5B4527@oracle.com> References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> <9A315AE7-A1F0-4837-A2C7-56A88C526CCB@oracle.com> <0846DF3A-12D2-4EE9-BB2D-4378FA5B4527@oracle.com> Message-ID: On Mon, Jul 4, 2022 at 10:04 AM Kim Barrett wrote: > > On Jul 4, 2022, at 3:55 AM, Kim Barrett wrote: > > > >> On Jul 4, 2022, at 3:18 AM, Thomas St?fe > wrote:One thing that I dislike is that this requires including > globalDefinitions.hpp. Not every cpp or hpp file may include that. Also, > since in theory, to have 100% coverage, every file and header should > include globalDefinitions.hpp, we may want to move these macros into an own > file to avoid blowing up the dependencies. > > > > I thought about putting the forbiddings in a different file, but > ultimately decided not. > > > > 1. globalDefinitions.hpp ends up being included nearly everywhere > anyway, almost certainly in places > > where these functions would be used. It?s currently such a dumping > ground for miscellaneous and > > often unrelated things that it ends up being hard to avoid. > > > > 2. globalDefinitions.hpp is where we do whatever conditionalization is > needed to #include a lot of > > the relevant ?system? and C library headers and perform some additional > massaging of them. > > These poisonings may involve parameter and return types from those > headers. > > One option is to put it in a separate file and use gcc's `-include` option > to ensure it is included > everywhere. But (2) is still an issue, so other refactoring of > globalDefinitions.hpp would also > be needed. (Not that such refactoring would be a bad thing, IMO.) > > I also realized that we almost always need macros.hpp for platform dependent switch macros. It is probably okay as it is. Maybe we could add a rule to the style guide saying that every cpp file should include globalDefinitions.hpp, similar to windows.h on Windows. But the mechanism is already very good. I can live with it as it is now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kbarrett at openjdk.org Mon Jul 4 08:29:32 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 4 Jul 2022 08:29:32 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v2] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 07:26:33 GMT, Thomas Stuefe wrote: >> [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. >> >> We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. >> >> I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. >> >> Places that allow raw C functions: >> - decoder on Linux, since the C++ demangler returns raw C heap >> - realpath, in conjunction with allowing real free for the returned buffer >> - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point >> - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities >> - obviously os::malloc and friends >> - NMT pre-initialization code because circularities >> - In gtest main function - I think gtest should work always, even if os::malloc is broken. >> >> Places I fixed: >> - ZGC, mountpoint string handling >> - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay >> - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter >> - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this >> - A couple of places in gtests. >> >> Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. >> >> ---- >> >> Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. >> >> GHAs are in work. >> >> >> [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Forgot one.. > - Review feedback Kim Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/9356 From iwalulya at openjdk.org Mon Jul 4 08:38:45 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 4 Jul 2022 08:38:45 GMT Subject: RFR: 8289520: G1: Remove duplicate checks in G1BarrierSetC1::post_barrier In-Reply-To: <3MnSORD3lOo4Et75TBnD9Y6RmysHMi07QUBKT3mqcsY=.554293bc-0a11-483c-b9c6-b6bc25313223@github.com> References: <3MnSORD3lOo4Et75TBnD9Y6RmysHMi07QUBKT3mqcsY=.554293bc-0a11-483c-b9c6-b6bc25313223@github.com> Message-ID: On Thu, 30 Jun 2022 12:08:59 GMT, Albert Mingkun Yang wrote: > Simple change of removing effectively dead code. > > Test: tier1-3 Marked as reviewed by iwalulya (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9333 From duke at openjdk.org Mon Jul 4 08:43:52 2022 From: duke at openjdk.org (Johannes Bechberger) Date: Mon, 4 Jul 2022 08:43:52 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Looks good and simple ------------- Marked as reviewed by parttimenerd at github.com (no known OpenJDK username). PR: https://git.openjdk.org/jdk/pull/9334 From mbaesken at openjdk.org Mon Jul 4 09:09:38 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 4 Jul 2022 09:09:38 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 08:40:05 GMT, Johannes Bechberger wrote: > Looks good and simple Hi, thanks for the review ! May I have a second review ? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From maurizio.cimadamore at oracle.com Mon Jul 4 09:53:47 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 4 Jul 2022 10:53:47 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> Message-ID: Hi Wojtek, thanks for sharing this list, I think this is a good starting point to understand more about your use case. Last week I've been looking at "getrusage" (as you mentioned it in an earlier email), and I was surprised to see that the call took a pointer to a (fairly big) struct which then needed to be initialized with some thread-local state: https://man7.org/linux/man-pages/man2/getrusage.2.html I've looked at the implementation, and it seems to be doing memset on the user-provided struct pointer, plus all the fields assignment. Eyeballing the implementation, this does not seem to me like a "classic" use case where dropping transition would help much. I mean, surely dropping transitions would help shaving some nanoseconds off the call, but it doesn't seem to me that the call would be shortlived enough to make a difference. Do you have some benchmarks on this one? I did some [1] and the call overhead seemed to come up at 260ns/op - w/o transition you might perhaps be able to get to 250ns, but that's in the noise? As for getpid, note that you can do (since Java 9): ProcessHandle.current().pid(); I believe the impl caches the result, so it shouldn't even make the native call. Maurizio [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java On 02/07/2022 07:42, Wojciech Kudla wrote: > Hi Maurizio, > > Thanks for staying on this. > > > Could you please provide a rough list of the native calls you make > where you believe critical JNI is having a real impact in the > performance of your application? > > From the top of my head: > clock_gettime > recvmsg > recvmmsg > sendmsg > sendmmsg > select > getpid > getcpu > getrusage > > > Also, could you please tell us whether any of these calls need to > interact with Java arrays? > No arrays or objects of any type involved. Everything happens by the > means of passing raw pointers as longs and using other primitive types > as function arguments. > > > In other words, do you use critical JNI to remove the cost > associated with thread transitions, or are you also taking advantage > of accessing on-heap memory _directly_ from native code? > Criticial JNI natives are used solely to remove the cost of > transitions. We don't get anywhere near java heap in native code. > > In general I think it makes a lot of sense for Java as a > language/platform to have some guards around unsafe code, but on the > other hand the popularity of libraries employing Unsafe and their > success in more performance-oriented corners of software engineering > is a clear indicator there is a need for the JVM to provide access to > more low-level primitives and mechanisms. > I think it's entirely fair to tell developers that all bets are off > when they get into some non-idiomatic scenarios but please don't take > away a feature that greatly contributed to Java's success. > > Kind regards, > Wojtek > > On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore > wrote: > > Hi Wojciech, > picking up this thread again. After some internal discussion, we > realize that we don't know enough about your use case. While > re-enabling JNI critical would obviously provide a quick fix, > we're afraid that (a) developers might end up depending on JNI > critical when they don't need to (perhaps also unaware of the > consequences of depending on it) and (b) that there might actually > be _better_ (as in: much faster) solutions than using critical > native calls to address at least some of your use cases (that > seemed to be the case with the clock_gettime example you > mentioned). Could you please provide a rough list of the native > calls you make where you believe critical JNI is having a real > impact in the performance of your application? Also, could you > please tell us whether any of these calls need to interact with > Java arrays? In other words, do you use critical JNI to remove the > cost associated with thread transitions, or are you also taking > advantage of accessing on-heap memory _directly_ from native code? > > Regards > Maurizio > > On 13/06/2022 21:38, Wojciech Kudla wrote: >> Hi Mark, >> >> Thanks for your input and apologies for the delayed response. >> >> > If the platform included, say, an intrinsified >> System.nanoRealTime() >> method that returned clock_gettime(CLOCK_REALTIME), how much would >> that help developers in your unnamed industry? >> >> Exposing realtime clock with nanosecond granularity in the JDK >> would be a great step forward. I should have made it clear that I >> represent fintech corner (investment banking to be exact) but the >> issues my message touches upon span areas such as HPC, audio >> processing, gaming, and defense industry so it's not like we have >> an isolated case. >> >> > In a similar vein, if people are finding it necessary to >> ?replace parts >> of NIO with hand-crafted native code? then it would be interesting to >> understand what their requirements are >> >> As for the other example I provided with making very short lived >> syscalls such as recvmsg/recvmmsg the premise is getting access >> to hardware timestamps on the ingress and egress ends as well as >> enabling batch receive with a single syscall and otherwise >> exploiting features unavailable from the JDK (like access to CMSG >> interface, scatter/gather, etc). >> There are also other examples of calls that we'd love to make >> often and at lowest possible cost (ie. getrusage) but I'm not >> sure if there's a strong case for some of these ideas, that's why >> it might be worth looking into more generic approach for >> performance sensitive code. >> Hope this does better job at explaining where we're coming from >> than my previous messages. >> >> Thanks, >> W >> >> On Tue, Jun 7, 2022 at 6:31 PM wrote: >> >> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >> >> Yes for System.nanoTime(), but System.currentTimeMillis() >> reports >> >> CLOCK_REALTIME. >> > >> > Unfortunately System.currentTimeMillis() offers only >> millisecond >> > granularity which is the reason why our industry has to >> resort to >> > clock_gettime. >> >> If the platform included, say, an intrinsified >> System.nanoRealTime() >> method that returned clock_gettime(CLOCK_REALTIME), how much >> would >> that help developers in your unnamed industry? >> >> In a similar vein, if people are finding it necessary to >> ?replace parts >> of NIO with hand-crafted native code? then it would be >> interesting to >> understand what their requirements are.? Some simple >> enhancements to >> the NIO API would be much less costly to design and implement >> than a >> generalized user-level native-call intrinsification mechanism. >> >> - Mark >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kbarrett at openjdk.org Mon Jul 4 10:45:40 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 4 Jul 2022 10:45:40 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v2] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 07:26:33 GMT, Thomas Stuefe wrote: >> [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. >> >> We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. >> >> I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. >> >> Places that allow raw C functions: >> - decoder on Linux, since the C++ demangler returns raw C heap >> - realpath, in conjunction with allowing real free for the returned buffer >> - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point >> - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities >> - obviously os::malloc and friends >> - NMT pre-initialization code because circularities >> - In gtest main function - I think gtest should work always, even if os::malloc is broken. >> >> Places I fixed: >> - ZGC, mountpoint string handling >> - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay >> - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter >> - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this >> - A couple of places in gtests. >> >> Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. >> >> ---- >> >> Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. >> >> GHAs are in work. >> >> >> [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Forgot one.. > - Review feedback Kim src/hotspot/os/linux/os_perf_linux.cpp line 788: > 786: jio_snprintf(buffer, PATH_MAX, "/proc/%s/exe", _entry->d_name); > 787: buffer[PATH_MAX - 1] = '\0'; > 788: ALLOW_C_FUNCTION(::realpath, return realpath(buffer, _exePath);) Shouldn't this be using `os::Posix::realpath`? ------------- PR: https://git.openjdk.org/jdk/pull/9356 From kim.barrett at oracle.com Mon Jul 4 10:46:26 2022 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 4 Jul 2022 10:46:26 +0000 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> <9A315AE7-A1F0-4837-A2C7-56A88C526CCB@oracle.com> Message-ID: > On Jul 4, 2022, at 3:18 AM, Thomas St?fe wrote: > On Sun, Jul 3, 2022 at 10:59 PM Kim Barrett wrote: > > On Jul 3, 2022, at 4:47 AM, Thomas St?fe wrote: > > > > I am preparing a patch to forbid C-heap allocation functions in hotspot as you proposed (https://github.com/openjdk/jdk/pull/9356). > > > > Interestingly, not all occurrences of forbidden functions are found everywhere. I found that if I compile on Ubuntu 20.04 with gcc 10.3., it does not complain about "realpath" even though I forbade it. If I build on Alpine, gcc 10.3.1, it finds occurrences of realpath. > > In which build variants? All? Or only fastdebug? If the latter, this might be another case of > _FORTIFY_SOURCE rewriting the call first, dodging the warning. This is mentioned in the > comment describing the gcc implementation of FORBID_C_FUNCTION. > > > No, it fails also on release to recognize realpath. Just to be sure I tested the most important other candidates (malloc, free, realloc, calloc, strdup) and those all work. It works (fails with expected warning) for me. gcc 11.2, in case that matters. The warning mechanism is only supported for gcc 10+. From rehn at openjdk.org Mon Jul 4 11:05:26 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 4 Jul 2022 11:05:26 GMT Subject: RFR: 8286957: Held monitor count [v4] In-Reply-To: References: Message-ID: > The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. > > This change makes the counting exact by pushing the counting down in the abstraction. > The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". > > An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. > > Fixed aarch64, x64, x86 and zero. > > Passes t1-8 Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into held-mon-count - Fixed var name - Merge branch 'master' into held-mon-count - Merge branch 'master' into held-mon-count - 8286957 - PR Baseline ------------- Changes: https://git.openjdk.org/jdk/pull/8945/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=8945&range=03 Stats: 517 lines in 43 files changed: 301 ins; 143 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/8945.diff Fetch: git fetch https://git.openjdk.org/jdk pull/8945/head:pull/8945 PR: https://git.openjdk.org/jdk/pull/8945 From wkudla.kernel at gmail.com Mon Jul 4 11:23:35 2022 From: wkudla.kernel at gmail.com (Wojciech Kudla) Date: Mon, 4 Jul 2022 12:23:35 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> Message-ID: Thanks Maurizio, I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think we're focusing on the wrong things here. Feel free to drop the two syscalls from the discussion entirely, but the main usecases I have been presenting throughout this thread definitely stand. Thanks On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < maurizio.cimadamore at oracle.com> wrote: > Hi Wojtek, > thanks for sharing this list, I think this is a good starting point to > understand more about your use case. > > Last week I've been looking at "getrusage" (as you mentioned it in an > earlier email), and I was surprised to see that the call took a pointer to > a (fairly big) struct which then needed to be initialized with some > thread-local state: > > https://man7.org/linux/man-pages/man2/getrusage.2.html > > I've looked at the implementation, and it seems to be doing memset on the > user-provided struct pointer, plus all the fields assignment. Eyeballing > the implementation, this does not seem to me like a "classic" use case > where dropping transition would help much. I mean, surely dropping > transitions would help shaving some nanoseconds off the call, but it > doesn't seem to me that the call would be shortlived enough to make a > difference. Do you have some benchmarks on this one? I did some [1] and the > call overhead seemed to come up at 260ns/op - w/o transition you might > perhaps be able to get to 250ns, but that's in the noise? > > As for getpid, note that you can do (since Java 9): > > ProcessHandle.current().pid(); > > I believe the impl caches the result, so it shouldn't even make the native > call. > > Maurizio > > [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java > On 02/07/2022 07:42, Wojciech Kudla wrote: > > Hi Maurizio, > > Thanks for staying on this. > > > Could you please provide a rough list of the native calls you make where > you believe critical JNI is having a real impact in the performance of your > application? > > From the top of my head: > clock_gettime > recvmsg > recvmmsg > sendmsg > sendmmsg > select > getpid > getcpu > getrusage > > > Also, could you please tell us whether any of these calls need to > interact with Java arrays? > No arrays or objects of any type involved. Everything happens by the means > of passing raw pointers as longs and using other primitive types as > function arguments. > > > In other words, do you use critical JNI to remove the cost associated > with thread transitions, or are you also taking advantage of accessing > on-heap memory _directly_ from native code? > Criticial JNI natives are used solely to remove the cost of transitions. > We don't get anywhere near java heap in native code. > > In general I think it makes a lot of sense for Java as a language/platform > to have some guards around unsafe code, but on the other hand the > popularity of libraries employing Unsafe and their success in more > performance-oriented corners of software engineering is a clear indicator > there is a need for the JVM to provide access to more low-level primitives > and mechanisms. > I think it's entirely fair to tell developers that all bets are off when > they get into some non-idiomatic scenarios but please don't take away a > feature that greatly contributed to Java's success. > > Kind regards, > Wojtek > > On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < > maurizio.cimadamore at oracle.com> wrote: > >> Hi Wojciech, >> picking up this thread again. After some internal discussion, we realize >> that we don't know enough about your use case. While re-enabling JNI >> critical would obviously provide a quick fix, we're afraid that (a) >> developers might end up depending on JNI critical when they don't need to >> (perhaps also unaware of the consequences of depending on it) and (b) that >> there might actually be _better_ (as in: much faster) solutions than using >> critical native calls to address at least some of your use cases (that >> seemed to be the case with the clock_gettime example you mentioned). Could >> you please provide a rough list of the native calls you make where you >> believe critical JNI is having a real impact in the performance of your >> application? Also, could you please tell us whether any of these calls need >> to interact with Java arrays? In other words, do you use critical JNI to >> remove the cost associated with thread transitions, or are you also taking >> advantage of accessing on-heap memory _directly_ from native code? >> >> Regards >> Maurizio >> On 13/06/2022 21:38, Wojciech Kudla wrote: >> >> Hi Mark, >> >> Thanks for your input and apologies for the delayed response. >> >> > If the platform included, say, an intrinsified System.nanoRealTime() >> method that returned clock_gettime(CLOCK_REALTIME), how much would >> that help developers in your unnamed industry? >> >> Exposing realtime clock with nanosecond granularity in the JDK would be a >> great step forward. I should have made it clear that I represent fintech >> corner (investment banking to be exact) but the issues my message touches >> upon span areas such as HPC, audio processing, gaming, and defense industry >> so it's not like we have an isolated case. >> >> > In a similar vein, if people are finding it necessary to ?replace parts >> of NIO with hand-crafted native code? then it would be interesting to >> understand what their requirements are >> >> As for the other example I provided with making very short lived syscalls >> such as recvmsg/recvmmsg the premise is getting access to hardware >> timestamps on the ingress and egress ends as well as enabling batch receive >> with a single syscall and otherwise exploiting features unavailable from >> the JDK (like access to CMSG interface, scatter/gather, etc). >> There are also other examples of calls that we'd love to make often and >> at lowest possible cost (ie. getrusage) but I'm not sure if there's a >> strong case for some of these ideas, that's why it might be worth looking >> into more generic approach for performance sensitive code. >> Hope this does better job at explaining where we're coming from than my >> previous messages. >> >> Thanks, >> W >> >> On Tue, Jun 7, 2022 at 6:31 PM wrote: >> >>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>> >> CLOCK_REALTIME. >>> > >>> > Unfortunately System.currentTimeMillis() offers only millisecond >>> > granularity which is the reason why our industry has to resort to >>> > clock_gettime. >>> >>> If the platform included, say, an intrinsified System.nanoRealTime() >>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>> that help developers in your unnamed industry? >>> >>> In a similar vein, if people are finding it necessary to ?replace parts >>> of NIO with hand-crafted native code? then it would be interesting to >>> understand what their requirements are. Some simple enhancements to >>> the NIO API would be much less costly to design and implement than a >>> generalized user-level native-call intrinsification mechanism. >>> >>> - Mark >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at openjdk.org Mon Jul 4 11:50:30 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Jul 2022 11:50:30 GMT Subject: [jdk19] RFR: 8288759: GCC 12 fails to compile signature.cpp due to -Wstringop-overread In-Reply-To: References: Message-ID: <24YYf6LIsfOZYGAtTSGeGoegEcIs_uDjY7jMOZBA3Vk=.13801e01-637c-468a-853e-7ef4265f23a5@github.com> On Sat, 25 Jun 2022 08:33:22 GMT, Kim Barrett wrote: >> Trying to compile with GCC 12.1.1 (current Fedora Rawhide) yields this failure: >> >> >> In file included from /home/test/shipilev-jdk/src/hotspot/share/utilities/globalDefinitions_gcc.hpp:35, >> from /home/test/shipilev-jdk/src/hotspot/share/utilities/globalDefinitions.hpp:35, >> from /home/test/shipilev-jdk/src/hotspot/share/memory/allocation.hpp:29, >> from /home/test/shipilev-jdk/src/hotspot/share/classfile/classLoaderData.hpp:28, >> from /home/test/shipilev-jdk/src/hotspot/share/precompiled/precompiled.hpp:34: >> In function 'const void* memchr(const void*, int, size_t)', >> inlined from 'int SignatureStream::scan_type(BasicType)' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:343:32, >> inlined from 'void SignatureStream::next()' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:373:19, >> inlined from 'void SignatureIterator::do_parameters_on(T*) [with T = Fingerprinter]' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.hpp:635:41, >> inlined from 'void SignatureIterator::do_parameters_on(T*) [with T = Fingerprinter]' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.hpp:629:6, >> inlined from 'void Fingerprinter::compute_fingerprint_and_return_type(bool)' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:169:19: > > src/hotspot/share/runtime/signature.cpp line 328: > >> 326: >> 327: PRAGMA_DIAG_PUSH >> 328: PRAGMA_STRINGOP_OVERREAD_IGNORED > > Don't make this change. The warning is indicating an actual problem with the code. The while loop on line 338 may terminate with `end == limit` if the string consists of just a sequence of '[' and then ends. If the loop ends for that reason, we later read `base[limit]`, invoking UB as limit is the length of base. As a proof of concept, adding > > if (end >= limit) return limit; > > after the while loop makes the warning go away. I have no idea what the correct thing to do for this might be. Returning limit might be wrong; I just used that to verify this issue is the source of the warning. You're right, this is a legit warning. I see other code in `signature.cpp` that handles `JVM_SIGNATURE_ARRAY` scans and checks whether we ended up scanning the string completely. We should do the same here. Let me see... ------------- PR: https://git.openjdk.org/jdk19/pull/49 From maurizio.cimadamore at oracle.com Mon Jul 4 11:59:09 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 4 Jul 2022 12:59:09 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> Message-ID: <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Hi, while I'm not an expert with some of the IO calls you mention (some of my colleagues are more knowledgeable in this area, so I'm sure they will have more info), my general sense is that, as with getrusage, if there is a system call involved, you already pay a hefty price for the user to kernel transition. On my machine this seem to cost around 200ns. In these cases, using JNI critical to shave off a dozen of nanoseconds (at best!) seems just not worth it. So, of the functions in your list, the ones in which I *believe* dropping transitions would have the most effect are (if we exclude getpid, for which another approach is possible) clock_gettime and getcpu, I believe, as they might use vdso [1], which typically brings the performance of these call closer to calls to shared lib functions. If you have examples e.g. where performance of recvmsg (or related calls) varies significantly between base JNI and critical JNI, please send them our way; I'm sure some of my colleagues would be intersted to take a look. Popping back a couple of levels, I think it would be helpful to also define what's an acceptable regression in this context. Of course, in an ideal world,? we'd like to see no performance regression at all. But JNI critical is an unsupported interface, which might misbehave with modern garbage collectors (e.g. ZGC) and that requires quite a bit of internal complexity which might, in the medium/long run, hinder the evolution of the Java platform (all these things have _some_ cost, even if the cost is not directly material to developers). In this vein, I think calls like clock_gettime tend to be more problematic: as they complete very quickly, you see the cost of transitions a lot more. In other cases, where syscalls are involved, the cost associated to transitions are more likely to be "in the noise". Of course if we look at absolute numbers, dropping transitions would always yield "faster" code; but at the same time, going from 250ns to 245ns is very unlikely to result in visible performance difference when considering an application as a whole, so I think it's critical here to decide _which_ use cases to prioritize. I think a good outcome of this discussion would be if we could come to some shared understanding of which native calls are truly problematic (e.g. clock_gettime-like), and then for the JDK to provide better (and more maintainable) alternatives for those (which might even be faster than using critical JNI). Thanks Maurizio [1] - https://man7.org/linux/man-pages/man7/vdso.7.html On 04/07/2022 12:23, Wojciech Kudla wrote: > Thanks Maurizio, > > I raised this case mainly about clock_gettime and recvmsg/sendmsg, I > think we're focusing on the wrong things here. Feel free to drop the > two syscalls from the discussion entirely, but the main usecases I > have been presenting throughout this thread definitely stand. > > Thanks > > > On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore > wrote: > > Hi Wojtek, > thanks for sharing this list, I think this is a good starting > point to understand more about your use case. > > Last week I've been looking at "getrusage" (as you mentioned it in > an earlier email), and I was surprised to see that the call took a > pointer to a (fairly big) struct which then needed to be > initialized with some thread-local state: > > https://man7.org/linux/man-pages/man2/getrusage.2.html > > I've looked at the implementation, and it seems to be doing memset > on the user-provided struct pointer, plus all the fields > assignment. Eyeballing the implementation, this does not seem to > me like a "classic" use case where dropping transition would help > much. I mean, surely dropping transitions would help shaving some > nanoseconds off the call, but it doesn't seem to me that the call > would be shortlived enough to make a difference. Do you have some > benchmarks on this one? I did some [1] and the call overhead > seemed to come up at 260ns/op - w/o transition you might perhaps > be able to get to 250ns, but that's in the noise? > > As for getpid, note that you can do (since Java 9): > > ProcessHandle.current().pid(); > > I believe the impl caches the result, so it shouldn't even make > the native call. > > Maurizio > > [1] - > http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java > > On 02/07/2022 07:42, Wojciech Kudla wrote: >> Hi Maurizio, >> >> Thanks for staying on this. >> >> > Could you please provide a rough list of the native calls you >> make where you believe critical JNI is having a real impact in >> the performance of your application? >> >> From the top of my head: >> clock_gettime >> recvmsg >> recvmmsg >> sendmsg >> sendmmsg >> select >> getpid >> getcpu >> getrusage >> >> > Also, could you please tell us whether any of these calls need >> to interact with Java arrays? >> No arrays or objects of any type involved. Everything happens by >> the means of passing raw pointers as longs and using other >> primitive types as function arguments. >> >> > In other words, do you use critical JNI to remove the cost >> associated with thread transitions, or are you also taking >> advantage of accessing on-heap memory _directly_ from native code? >> Criticial JNI natives are used solely to remove the cost of >> transitions. We don't get anywhere near java heap in native code. >> >> In general I think it makes a lot of sense for Java as a >> language/platform to have some guards around unsafe code, but on >> the other hand the popularity of libraries employing Unsafe and >> their success in more performance-oriented corners of software >> engineering is a clear indicator there is a need for the JVM to >> provide access to more low-level primitives and mechanisms. >> I think it's entirely fair to tell developers that all bets are >> off when they get into some non-idiomatic scenarios but please >> don't take away a feature that greatly contributed to Java's success. >> >> Kind regards, >> Wojtek >> >> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore >> wrote: >> >> Hi Wojciech, >> picking up this thread again. After some internal discussion, >> we realize that we don't know enough about your use case. >> While re-enabling JNI critical would obviously provide a >> quick fix, we're afraid that (a) developers might end up >> depending on JNI critical when they don't need to (perhaps >> also unaware of the consequences of depending on it) and (b) >> that there might actually be _better_ (as in: much faster) >> solutions than using critical native calls to address at >> least some of your use cases (that seemed to be the case with >> the clock_gettime example you mentioned). Could you please >> provide a rough list of the native calls you make where you >> believe critical JNI is having a real impact in the >> performance of your application? Also, could you please tell >> us whether any of these calls need to interact with Java >> arrays? In other words, do you use critical JNI to remove the >> cost associated with thread transitions, or are you also >> taking advantage of accessing on-heap memory _directly_ from >> native code? >> >> Regards >> Maurizio >> >> On 13/06/2022 21:38, Wojciech Kudla wrote: >>> Hi Mark, >>> >>> Thanks for your input and apologies for the delayed response. >>> >>> > If the platform included, say, an intrinsified >>> System.nanoRealTime() >>> method that returned clock_gettime(CLOCK_REALTIME), how much >>> would >>> that help developers in your unnamed industry? >>> >>> Exposing realtime clock with nanosecond granularity in the >>> JDK would be a great step forward. I should have made it >>> clear that I represent fintech corner (investment banking to >>> be exact) but the issues my message touches upon span areas >>> such as HPC, audio processing, gaming, and defense industry >>> so it's not like we have an isolated case. >>> >>> > In a similar vein, if people are finding it necessary to >>> ?replace parts >>> of NIO with hand-crafted native code? then it would be >>> interesting to >>> understand what their requirements are >>> >>> As for the other example I provided with making very short >>> lived syscalls such as recvmsg/recvmmsg the premise is >>> getting access to hardware timestamps on the ingress and >>> egress ends as well as enabling batch receive with a single >>> syscall and otherwise exploiting features unavailable from >>> the JDK (like access to CMSG interface, scatter/gather, etc). >>> There are also other examples of calls that we'd love to >>> make often and at lowest possible cost (ie. getrusage) but >>> I'm not sure if there's a strong case for some of these >>> ideas, that's why it might be worth looking into more >>> generic approach for performance sensitive code. >>> Hope this does better job at explaining where we're coming >>> from than my previous messages. >>> >>> Thanks, >>> W >>> >>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>> >>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>> >> Yes for System.nanoTime(), but >>> System.currentTimeMillis() reports >>> >> CLOCK_REALTIME. >>> > >>> > Unfortunately System.currentTimeMillis() offers only >>> millisecond >>> > granularity which is the reason why our industry has >>> to resort to >>> > clock_gettime. >>> >>> If the platform included, say, an intrinsified >>> System.nanoRealTime() >>> method that returned clock_gettime(CLOCK_REALTIME), how >>> much would >>> that help developers in your unnamed industry? >>> >>> In a similar vein, if people are finding it necessary to >>> ?replace parts >>> of NIO with hand-crafted native code? then it would be >>> interesting to >>> understand what their requirements are.? Some simple >>> enhancements to >>> the NIO API would be much less costly to design and >>> implement than a >>> generalized user-level native-call intrinsification >>> mechanism. >>> >>> - Mark >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at openjdk.org Mon Jul 4 12:12:40 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Jul 2022 12:12:40 GMT Subject: [jdk19] RFR: 8288759: GCC 12 fails to compile signature.cpp due to -Wstringop-overread [v2] In-Reply-To: References: Message-ID: > Trying to compile with GCC 12.1.1 (current Fedora Rawhide) yields this failure: > > > In file included from /home/test/shipilev-jdk/src/hotspot/share/utilities/globalDefinitions_gcc.hpp:35, > from /home/test/shipilev-jdk/src/hotspot/share/utilities/globalDefinitions.hpp:35, > from /home/test/shipilev-jdk/src/hotspot/share/memory/allocation.hpp:29, > from /home/test/shipilev-jdk/src/hotspot/share/classfile/classLoaderData.hpp:28, > from /home/test/shipilev-jdk/src/hotspot/share/precompiled/precompiled.hpp:34: > In function 'const void* memchr(const void*, int, size_t)', > inlined from 'int SignatureStream::scan_type(BasicType)' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:343:32, > inlined from 'void SignatureStream::next()' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:373:19, > inlined from 'void SignatureIterator::do_parameters_on(T*) [with T = Fingerprinter]' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.hpp:635:41, > inlined from 'void SignatureIterator::do_parameters_on(T*) [with T = Fingerprinter]' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.hpp:629:6, > inlined from 'void Fingerprinter::compute_fingerprint_and_return_type(bool)' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:169:19: Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Better fix the actual warning - Merge branch 'master' into JDK-8288759-gcc12-string-overread - Fix ------------- Changes: - all: https://git.openjdk.org/jdk19/pull/49/files - new: https://git.openjdk.org/jdk19/pull/49/files/e74cede8..8209c0aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk19&pr=49&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk19&pr=49&range=00-01 Stats: 2768 lines in 119 files changed: 2065 ins; 329 del; 374 mod Patch: https://git.openjdk.org/jdk19/pull/49.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/49/head:pull/49 PR: https://git.openjdk.org/jdk19/pull/49 From shade at openjdk.org Mon Jul 4 12:41:44 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Jul 2022 12:41:44 GMT Subject: [jdk19] RFR: 8288759: GCC 12 fails to compile signature.cpp due to -Wstringop-overread [v2] In-Reply-To: References: Message-ID: On Sat, 25 Jun 2022 08:37:03 GMT, Kim Barrett wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Better fix the actual warning >> - Merge branch 'master' into JDK-8288759-gcc12-string-overread >> - Fix > > Changes requested by kbarrett (Reviewer). @kimbarrett, @coleenp -- I redid the fix to fix the actual warning instead. I opted to return `limit` on the failure path + assert it does not actually happen in practice. It looks that returning `limit` is acceptable, as it rolls over to the "end of signature" on error. We can make that `fatal()` instead, but I don't like to penalize `release` builds unnecessarily. ------------- PR: https://git.openjdk.org/jdk19/pull/49 From wkudla.kernel at gmail.com Mon Jul 4 12:50:47 2022 From: wkudla.kernel at gmail.com (Wojciech Kudla) Date: Mon, 4 Jul 2022 13:50:47 +0100 Subject: Obsoleting JavaCritical In-Reply-To: <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: Hi Maurizio, You are correct that under normal circumstances sycalls that are not supported by vDSO are very heavy but when we call recvmsg/sendmsg we don't even perform a syscall at all. High frequency trading shops employ kernel bypass for all network flows pretty much by default. The most popular solution here is OpenOnload used with Xilinix products. For a case when there's nothing to read from the RX ring a JavaCrtical JNI call to recvmsg completes in ~11ns vs 23ns for a standard JNI call with full transition. Sorry, I've been in this for so long I kind of assumed it's implied. Thanks, W. On Mon, Jul 4, 2022 at 12:59 PM Maurizio Cimadamore < maurizio.cimadamore at oracle.com> wrote: > Hi, > while I'm not an expert with some of the IO calls you mention (some of my > colleagues are more knowledgeable in this area, so I'm sure they will have > more info), my general sense is that, as with getrusage, if there is a > system call involved, you already pay a hefty price for the user to kernel > transition. On my machine this seem to cost around 200ns. In these cases, > using JNI critical to shave off a dozen of nanoseconds (at best!) seems > just not worth it. > > So, of the functions in your list, the ones in which I *believe* dropping > transitions would have the most effect are (if we exclude getpid, for which > another approach is possible) clock_gettime and getcpu, I believe, as they > might use vdso [1], which typically brings the performance of these call > closer to calls to shared lib functions. > > If you have examples e.g. where performance of recvmsg (or related calls) > varies significantly between base JNI and critical JNI, please send them > our way; I'm sure some of my colleagues would be intersted to take a look. > > Popping back a couple of levels, I think it would be helpful to also > define what's an acceptable regression in this context. Of course, in an > ideal world, we'd like to see no performance regression at all. But JNI > critical is an unsupported interface, which might misbehave with modern > garbage collectors (e.g. ZGC) and that requires quite a bit of internal > complexity which might, in the medium/long run, hinder the evolution of the > Java platform (all these things have _some_ cost, even if the cost is not > directly material to developers). In this vein, I think calls like > clock_gettime tend to be more problematic: as they complete very quickly, > you see the cost of transitions a lot more. In other cases, where syscalls > are involved, the cost associated to transitions are more likely to be "in > the noise". Of course if we look at absolute numbers, dropping transitions > would always yield "faster" code; but at the same time, going from 250ns to > 245ns is very unlikely to result in visible performance difference when > considering an application as a whole, so I think it's critical here to > decide _which_ use cases to prioritize. > > I think a good outcome of this discussion would be if we could come to > some shared understanding of which native calls are truly problematic (e.g. > clock_gettime-like), and then for the JDK to provide better (and more > maintainable) alternatives for those (which might even be faster than using > critical JNI). > > Thanks > Maurizio > > [1] - https://man7.org/linux/man-pages/man7/vdso.7.html > On 04/07/2022 12:23, Wojciech Kudla wrote: > > Thanks Maurizio, > > I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think > we're focusing on the wrong things here. Feel free to drop the two syscalls > from the discussion entirely, but the main usecases I have been presenting > throughout this thread definitely stand. > > Thanks > > > On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < > maurizio.cimadamore at oracle.com> wrote: > >> Hi Wojtek, >> thanks for sharing this list, I think this is a good starting point to >> understand more about your use case. >> >> Last week I've been looking at "getrusage" (as you mentioned it in an >> earlier email), and I was surprised to see that the call took a pointer to >> a (fairly big) struct which then needed to be initialized with some >> thread-local state: >> >> https://man7.org/linux/man-pages/man2/getrusage.2.html >> >> I've looked at the implementation, and it seems to be doing memset on the >> user-provided struct pointer, plus all the fields assignment. Eyeballing >> the implementation, this does not seem to me like a "classic" use case >> where dropping transition would help much. I mean, surely dropping >> transitions would help shaving some nanoseconds off the call, but it >> doesn't seem to me that the call would be shortlived enough to make a >> difference. Do you have some benchmarks on this one? I did some [1] and the >> call overhead seemed to come up at 260ns/op - w/o transition you might >> perhaps be able to get to 250ns, but that's in the noise? >> >> As for getpid, note that you can do (since Java 9): >> >> ProcessHandle.current().pid(); >> >> I believe the impl caches the result, so it shouldn't even make the >> native call. >> >> Maurizio >> >> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >> On 02/07/2022 07:42, Wojciech Kudla wrote: >> >> Hi Maurizio, >> >> Thanks for staying on this. >> >> > Could you please provide a rough list of the native calls you make >> where you believe critical JNI is having a real impact in the performance >> of your application? >> >> From the top of my head: >> clock_gettime >> recvmsg >> recvmmsg >> sendmsg >> sendmmsg >> select >> getpid >> getcpu >> getrusage >> >> > Also, could you please tell us whether any of these calls need to >> interact with Java arrays? >> No arrays or objects of any type involved. Everything happens by the >> means of passing raw pointers as longs and using other primitive types as >> function arguments. >> >> > In other words, do you use critical JNI to remove the cost associated >> with thread transitions, or are you also taking advantage of accessing >> on-heap memory _directly_ from native code? >> Criticial JNI natives are used solely to remove the cost of transitions. >> We don't get anywhere near java heap in native code. >> >> In general I think it makes a lot of sense for Java as a >> language/platform to have some guards around unsafe code, but on the other >> hand the popularity of libraries employing Unsafe and their success in more >> performance-oriented corners of software engineering is a clear indicator >> there is a need for the JVM to provide access to more low-level primitives >> and mechanisms. >> I think it's entirely fair to tell developers that all bets are off when >> they get into some non-idiomatic scenarios but please don't take away a >> feature that greatly contributed to Java's success. >> >> Kind regards, >> Wojtek >> >> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < >> maurizio.cimadamore at oracle.com> wrote: >> >>> Hi Wojciech, >>> picking up this thread again. After some internal discussion, we realize >>> that we don't know enough about your use case. While re-enabling JNI >>> critical would obviously provide a quick fix, we're afraid that (a) >>> developers might end up depending on JNI critical when they don't need to >>> (perhaps also unaware of the consequences of depending on it) and (b) that >>> there might actually be _better_ (as in: much faster) solutions than using >>> critical native calls to address at least some of your use cases (that >>> seemed to be the case with the clock_gettime example you mentioned). Could >>> you please provide a rough list of the native calls you make where you >>> believe critical JNI is having a real impact in the performance of your >>> application? Also, could you please tell us whether any of these calls need >>> to interact with Java arrays? In other words, do you use critical JNI to >>> remove the cost associated with thread transitions, or are you also taking >>> advantage of accessing on-heap memory _directly_ from native code? >>> >>> Regards >>> Maurizio >>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>> >>> Hi Mark, >>> >>> Thanks for your input and apologies for the delayed response. >>> >>> > If the platform included, say, an intrinsified System.nanoRealTime() >>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>> that help developers in your unnamed industry? >>> >>> Exposing realtime clock with nanosecond granularity in the JDK would be >>> a great step forward. I should have made it clear that I represent fintech >>> corner (investment banking to be exact) but the issues my message touches >>> upon span areas such as HPC, audio processing, gaming, and defense industry >>> so it's not like we have an isolated case. >>> >>> > In a similar vein, if people are finding it necessary to ?replace parts >>> of NIO with hand-crafted native code? then it would be interesting to >>> understand what their requirements are >>> >>> As for the other example I provided with making very short lived >>> syscalls such as recvmsg/recvmmsg the premise is getting access to hardware >>> timestamps on the ingress and egress ends as well as enabling batch receive >>> with a single syscall and otherwise exploiting features unavailable from >>> the JDK (like access to CMSG interface, scatter/gather, etc). >>> There are also other examples of calls that we'd love to make often and >>> at lowest possible cost (ie. getrusage) but I'm not sure if there's a >>> strong case for some of these ideas, that's why it might be worth looking >>> into more generic approach for performance sensitive code. >>> Hope this does better job at explaining where we're coming from than my >>> previous messages. >>> >>> Thanks, >>> W >>> >>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>> >>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>> >> CLOCK_REALTIME. >>>> > >>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>> > granularity which is the reason why our industry has to resort to >>>> > clock_gettime. >>>> >>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> In a similar vein, if people are finding it necessary to ?replace parts >>>> of NIO with hand-crafted native code? then it would be interesting to >>>> understand what their requirements are. Some simple enhancements to >>>> the NIO API would be much less costly to design and implement than a >>>> generalized user-level native-call intrinsification mechanism. >>>> >>>> - Mark >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Mon Jul 4 13:27:13 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 4 Jul 2022 14:27:13 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: Thanks for the clarification, this is very helpful. I also assume that the case when "there's nothing to read" is common enough to make a difference? Maurizio On 04/07/2022 13:50, Wojciech Kudla wrote: > Hi Maurizio, > > You are correct that under normal circumstances sycalls that are not > supported by vDSO are very heavy but when we call recvmsg/sendmsg we > don't even perform a syscall at all. High frequency trading shops > employ kernel bypass for all network flows pretty much by default. The > most popular solution here is OpenOnload used with Xilinix products. > For a case when there's nothing to read from the RX ring a JavaCrtical > JNI call to recvmsg completes in ~11ns vs 23ns for a standard JNI call > with full transition. > Sorry, I've been in this for so long I kind of assumed it's implied. > > Thanks, > W. > > On Mon, Jul 4, 2022 at 12:59 PM Maurizio Cimadamore > wrote: > > Hi, > while I'm not an expert with some of the IO calls you mention > (some of my colleagues are more knowledgeable in this area, so I'm > sure they will have more info), my general sense is that, as with > getrusage, if there is a system call involved, you already pay a > hefty price for the user to kernel transition. On my machine this > seem to cost around 200ns. In these cases, using JNI critical to > shave off a dozen of nanoseconds (at best!) seems just not worth it. > > So, of the functions in your list, the ones in which I *believe*? > dropping transitions would have the most effect are (if we exclude > getpid, for which another approach is possible) clock_gettime and > getcpu, I believe, as they might use vdso [1], which typically > brings the performance of these call closer to calls to shared lib > functions. > > If you have examples e.g. where performance of recvmsg (or related > calls) varies significantly between base JNI and critical JNI, > please send them our way; I'm sure some of my colleagues would be > intersted to take a look. > > Popping back a couple of levels, I think it would be helpful to > also define what's an acceptable regression in this context. Of > course, in an ideal world,? we'd like to see no performance > regression at all. But JNI critical is an unsupported interface, > which might misbehave with modern garbage collectors (e.g. ZGC) > and that requires quite a bit of internal complexity which might, > in the medium/long run, hinder the evolution of the Java platform > (all these things have _some_ cost, even if the cost is not > directly material to developers). In this vein, I think calls like > clock_gettime tend to be more problematic: as they complete very > quickly, you see the cost of transitions a lot more. In other > cases, where syscalls are involved, the cost associated to > transitions are more likely to be "in the noise". Of course if we > look at absolute numbers, dropping transitions would always yield > "faster" code; but at the same time, going from 250ns to 245ns is > very unlikely to result in visible performance difference when > considering an application as a whole, so I think it's critical > here to decide _which_ use cases to prioritize. > > I think a good outcome of this discussion would be if we could > come to some shared understanding of which native calls are truly > problematic (e.g. clock_gettime-like), and then for the JDK to > provide better (and more maintainable) alternatives for those > (which might even be faster than using critical JNI). > > Thanks > Maurizio > > [1] - https://man7.org/linux/man-pages/man7/vdso.7.html > > > On 04/07/2022 12:23, Wojciech Kudla wrote: >> Thanks Maurizio, >> >> I raised this case mainly about clock_gettime and >> recvmsg/sendmsg, I think we're focusing on the wrong things here. >> Feel free to drop the two syscalls from the discussion entirely, >> but the main usecases I have been presenting throughout this >> thread definitely stand. >> >> Thanks >> >> >> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore >> wrote: >> >> Hi Wojtek, >> thanks for sharing this list, I think this is a good starting >> point to understand more about your use case. >> >> Last week I've been looking at "getrusage" (as you mentioned >> it in an earlier email), and I was surprised to see that the >> call took a pointer to a (fairly big) struct which then >> needed to be initialized with some thread-local state: >> >> https://man7.org/linux/man-pages/man2/getrusage.2.html >> >> >> I've looked at the implementation, and it seems to be doing >> memset on the user-provided struct pointer, plus all the >> fields assignment. Eyeballing the implementation, this does >> not seem to me like a "classic" use case where dropping >> transition would help much. I mean, surely dropping >> transitions would help shaving some nanoseconds off the call, >> but it doesn't seem to me that the call would be shortlived >> enough to make a difference. Do you have some benchmarks on >> this one? I did some [1] and the call overhead seemed to come >> up at 260ns/op - w/o transition you might perhaps be able to >> get to 250ns, but that's in the noise? >> >> As for getpid, note that you can do (since Java 9): >> >> ProcessHandle.current().pid(); >> >> I believe the impl caches the result, so it shouldn't even >> make the native call. >> >> Maurizio >> >> [1] - >> http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >> >> On 02/07/2022 07:42, Wojciech Kudla wrote: >>> Hi Maurizio, >>> >>> Thanks for staying on this. >>> >>> > Could you please provide a rough list of the native calls >>> you make where you believe critical JNI is having a real >>> impact in the performance of your application? >>> >>> From the top of my head: >>> clock_gettime >>> recvmsg >>> recvmmsg >>> sendmsg >>> sendmmsg >>> select >>> getpid >>> getcpu >>> getrusage >>> >>> > Also, could you please tell us whether any of these calls >>> need to interact with Java arrays? >>> No arrays or objects of any type involved. Everything >>> happens by the means of passing raw pointers as longs and >>> using other primitive types as function arguments. >>> >>> > In other words, do you use critical JNI to remove the cost >>> associated with thread transitions, or are you also taking >>> advantage of accessing on-heap memory _directly_ from native >>> code? >>> Criticial JNI natives are used solely to remove the cost of >>> transitions. We don't get anywhere near java heap in native >>> code. >>> >>> In general I think it makes a lot of sense for Java as a >>> language/platform to have some guards around unsafe code, >>> but on the other hand the popularity of libraries employing >>> Unsafe and their success in more performance-oriented >>> corners of software engineering is a clear indicator there >>> is a need for the JVM to provide access to more low-level >>> primitives and mechanisms. >>> I think it's entirely fair to tell developers that all bets >>> are off when they get into some non-idiomatic scenarios but >>> please don't take away a feature that greatly contributed to >>> Java's success. >>> >>> Kind regards, >>> Wojtek >>> >>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore >>> wrote: >>> >>> Hi Wojciech, >>> picking up this thread again. After some internal >>> discussion, we realize that we don't know enough about >>> your use case. While re-enabling JNI critical would >>> obviously provide a quick fix, we're afraid that (a) >>> developers might end up depending on JNI critical when >>> they don't need to (perhaps also unaware of the >>> consequences of depending on it) and (b) that there >>> might actually be _better_ (as in: much faster) >>> solutions than using critical native calls to address at >>> least some of your use cases (that seemed to be the case >>> with the clock_gettime example you mentioned). Could you >>> please provide a rough list of the native calls you make >>> where you believe critical JNI is having a real impact >>> in the performance of your application? Also, could you >>> please tell us whether any of these calls need to >>> interact with Java arrays? In other words, do you use >>> critical JNI to remove the cost associated with thread >>> transitions, or are you also taking advantage of >>> accessing on-heap memory _directly_ from native code? >>> >>> Regards >>> Maurizio >>> >>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>> Hi Mark, >>>> >>>> Thanks for your input and apologies for the delayed >>>> response. >>>> >>>> > If the platform included, say, an intrinsified >>>> System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how >>>> much would >>>> that help developers in your unnamed industry? >>>> >>>> Exposing realtime clock with nanosecond granularity in >>>> the JDK would be a great step forward. I should have >>>> made it clear that I represent fintech corner >>>> (investment banking to be exact) but the issues my >>>> message touches upon span areas such as HPC, audio >>>> processing, gaming, and defense industry so it's not >>>> like we have an isolated case. >>>> >>>> > In a similar vein, if people are finding it necessary >>>> to ?replace parts >>>> of NIO with hand-crafted native code? then it would be >>>> interesting to >>>> understand what their requirements are >>>> >>>> As for the other example I provided with making very >>>> short lived syscalls such as recvmsg/recvmmsg the >>>> premise is getting access to hardware timestamps on the >>>> ingress and egress ends as well as enabling batch >>>> receive with a single syscall and otherwise exploiting >>>> features unavailable from the JDK (like access to CMSG >>>> interface, scatter/gather, etc). >>>> There are also other examples of calls that we'd love >>>> to make often and at lowest possible cost (ie. >>>> getrusage) but I'm not sure if there's a strong case >>>> for some of these ideas, that's why it might be worth >>>> looking into more generic approach for performance >>>> sensitive code. >>>> Hope this does better job at explaining where we're >>>> coming from than my previous messages. >>>> >>>> Thanks, >>>> W >>>> >>>> On Tue, Jun 7, 2022 at 6:31 PM >>>> wrote: >>>> >>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>> >> Yes for System.nanoTime(), but >>>> System.currentTimeMillis() reports >>>> >> CLOCK_REALTIME. >>>> > >>>> > Unfortunately System.currentTimeMillis() offers >>>> only millisecond >>>> > granularity which is the reason why our industry >>>> has to resort to >>>> > clock_gettime. >>>> >>>> If the platform included, say, an intrinsified >>>> System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), >>>> how much would >>>> that help developers in your unnamed industry? >>>> >>>> In a similar vein, if people are finding it >>>> necessary to ?replace parts >>>> of NIO with hand-crafted native code? then it would >>>> be interesting to >>>> understand what their requirements are.? Some >>>> simple enhancements to >>>> the NIO API would be much less costly to design and >>>> implement than a >>>> generalized user-level native-call intrinsification >>>> mechanism. >>>> >>>> - Mark >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at openjdk.org Mon Jul 4 13:29:40 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 4 Jul 2022 13:29:40 GMT Subject: Integrated: 8288971: AArch64: Clean up stack and register handling in interpreter In-Reply-To: References: Message-ID: On Wed, 22 Jun 2022 13:00:44 GMT, Andrew Haley wrote: > There are several places in the interpreter that could be improved. > > 1. We use r13 to pass the caller's SP to a callee through adapters. r13 is not a callee-saved register in the native ABI, so this causes some complications. Use a callee-saved register. > 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. > 3. Related to 1, we should clearly label all the places where the caller's SP is passed to a callee. This pull request has now been integrated. Changeset: b5d96565 Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/b5d965656d937e31ca7d3224c4e981d5083091c9 Stats: 170 lines in 15 files changed: 66 ins; 38 del; 66 mod 8288971: AArch64: Clean up stack and register handling in interpreter Reviewed-by: adinn, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/9239 From stuefe at openjdk.org Mon Jul 4 13:31:43 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Jul 2022 13:31:43 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v3] In-Reply-To: References: Message-ID: > [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. > > We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. > > I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. > > Places that allow raw C functions: > - decoder on Linux, since the C++ demangler returns raw C heap > - realpath, in conjunction with allowing real free for the returned buffer > - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point > - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities > - obviously os::malloc and friends > - NMT pre-initialization code because circularities > - In gtest main function - I think gtest should work always, even if os::malloc is broken. > > Places I fixed: > - ZGC, mountpoint string handling > - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay > - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter > - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this > - A couple of places in gtests. > > Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. > > ---- > > Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. > > GHAs are in work. > > > [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Use our realpath() wrapper in os_perf_linux.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9356/files - new: https://git.openjdk.org/jdk/pull/9356/files/b93ca75f..f304868d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9356&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9356&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9356.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9356/head:pull/9356 PR: https://git.openjdk.org/jdk/pull/9356 From stuefe at openjdk.org Mon Jul 4 13:31:46 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Jul 2022 13:31:46 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v2] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 10:41:32 GMT, Kim Barrett wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - Forgot one.. >> - Review feedback Kim > > src/hotspot/os/linux/os_perf_linux.cpp line 788: > >> 786: jio_snprintf(buffer, PATH_MAX, "/proc/%s/exe", _entry->d_name); >> 787: buffer[PATH_MAX - 1] = '\0'; >> 788: ALLOW_C_FUNCTION(::realpath, return realpath(buffer, _exePath);) > > Shouldn't this be using `os::Posix::realpath`? Yes, I agree. Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/9356 From thomas.stuefe at gmail.com Mon Jul 4 13:36:06 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Jul 2022 15:36:06 +0200 Subject: Should we rename os:: functions that are named like standard C- or Posix-functions? In-Reply-To: References: <6d8939d9-8f3e-6bd0-2e33-b54259a2d5a6@oracle.com> <9A315AE7-A1F0-4837-A2C7-56A88C526CCB@oracle.com> Message-ID: On Mon, Jul 4, 2022 at 12:46 PM Kim Barrett wrote: > > On Jul 4, 2022, at 3:18 AM, Thomas St?fe > wrote: > > On Sun, Jul 3, 2022 at 10:59 PM Kim Barrett > wrote: > > > On Jul 3, 2022, at 4:47 AM, Thomas St?fe > wrote: > > > > > > I am preparing a patch to forbid C-heap allocation functions in > hotspot as you proposed (https://github.com/openjdk/jdk/pull/9356). > > > > > > Interestingly, not all occurrences of forbidden functions are found > everywhere. I found that if I compile on Ubuntu 20.04 with gcc 10.3., it > does not complain about "realpath" even though I forbade it. If I build on > Alpine, gcc 10.3.1, it finds occurrences of realpath. > > > > In which build variants? All? Or only fastdebug? If the latter, this > might be another case of > > _FORTIFY_SOURCE rewriting the call first, dodging the warning. This is > mentioned in the > > comment describing the gcc implementation of FORBID_C_FUNCTION. > > > > > > No, it fails also on release to recognize realpath. Just to be sure I > tested the most important other candidates (malloc, free, realloc, calloc, > strdup) and those all work. > > It works (fails with expected warning) for me. gcc 11.2, in case that > matters. > > The warning mechanism is only supported for gcc 10+. > > Interesting. My failing example uses gcc 10.3.0, the one that works uses 10.3.1 . Maybe there was a bug fix. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adinn at openjdk.org Mon Jul 4 13:58:40 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 4 Jul 2022 13:58:40 GMT Subject: RFR: 8288992: AArch64: CMN should be handled the same way as CMP In-Reply-To: References: Message-ID: On Wed, 22 Jun 2022 17:03:42 GMT, Andrew Haley wrote: > At present, `cmp(r8, -1)` fails at compile time, but `cmn(r8, -1)` fails at runtime. We should fix cmn() to be the same as `cmp()`. > > After this change, it's much less likely that we'll be surprised by immediate overflows in `cmn()`. Looks good. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9246 From ngasson at openjdk.org Mon Jul 4 14:12:46 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Mon, 4 Jul 2022 14:12:46 GMT Subject: RFR: 8288992: AArch64: CMN should be handled the same way as CMP In-Reply-To: References: Message-ID: On Wed, 22 Jun 2022 17:03:42 GMT, Andrew Haley wrote: > At present, `cmp(r8, -1)` fails at compile time, but `cmn(r8, -1)` fails at runtime. We should fix cmn() to be the same as `cmp()`. > > After this change, it's much less likely that we'll be surprised by immediate overflows in `cmn()`. I tested tier1-3. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.org/jdk/pull/9246 From wkudla.kernel at gmail.com Mon Jul 4 14:58:01 2022 From: wkudla.kernel at gmail.com (Wojciech Kudla) Date: Mon, 4 Jul 2022 15:58:01 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: > I also assume that the case when "there's nothing to read" is common enough to make a difference? Yes, I'd say the "nothing on the wire" is at least a three nines scenario but even in the presence of data in the NIC's rx ring the call will complete in low tens of nanos anyway so the overhead of JNI call matters in both cases. On Mon, Jul 4, 2022 at 2:27 PM Maurizio Cimadamore < maurizio.cimadamore at oracle.com> wrote: > Thanks for the clarification, this is very helpful. > > I also assume that the case when "there's nothing to read" is common > enough to make a difference? > > Maurizio > > > On 04/07/2022 13:50, Wojciech Kudla wrote: > > Hi Maurizio, > > You are correct that under normal circumstances sycalls that are not > supported by vDSO are very heavy but when we call recvmsg/sendmsg we don't > even perform a syscall at all. High frequency trading shops employ kernel > bypass for all network flows pretty much by default. The most popular > solution here is OpenOnload used with Xilinix products. For a case when > there's nothing to read from the RX ring a JavaCrtical JNI call to recvmsg > completes in ~11ns vs 23ns for a standard JNI call with full transition. > Sorry, I've been in this for so long I kind of assumed it's implied. > > Thanks, > W. > > On Mon, Jul 4, 2022 at 12:59 PM Maurizio Cimadamore < > maurizio.cimadamore at oracle.com> wrote: > >> Hi, >> while I'm not an expert with some of the IO calls you mention (some of my >> colleagues are more knowledgeable in this area, so I'm sure they will have >> more info), my general sense is that, as with getrusage, if there is a >> system call involved, you already pay a hefty price for the user to kernel >> transition. On my machine this seem to cost around 200ns. In these cases, >> using JNI critical to shave off a dozen of nanoseconds (at best!) seems >> just not worth it. >> >> So, of the functions in your list, the ones in which I *believe* >> dropping transitions would have the most effect are (if we exclude getpid, >> for which another approach is possible) clock_gettime and getcpu, I >> believe, as they might use vdso [1], which typically brings the performance >> of these call closer to calls to shared lib functions. >> >> If you have examples e.g. where performance of recvmsg (or related calls) >> varies significantly between base JNI and critical JNI, please send them >> our way; I'm sure some of my colleagues would be intersted to take a look. >> >> Popping back a couple of levels, I think it would be helpful to also >> define what's an acceptable regression in this context. Of course, in an >> ideal world, we'd like to see no performance regression at all. But JNI >> critical is an unsupported interface, which might misbehave with modern >> garbage collectors (e.g. ZGC) and that requires quite a bit of internal >> complexity which might, in the medium/long run, hinder the evolution of the >> Java platform (all these things have _some_ cost, even if the cost is not >> directly material to developers). In this vein, I think calls like >> clock_gettime tend to be more problematic: as they complete very quickly, >> you see the cost of transitions a lot more. In other cases, where syscalls >> are involved, the cost associated to transitions are more likely to be "in >> the noise". Of course if we look at absolute numbers, dropping transitions >> would always yield "faster" code; but at the same time, going from 250ns to >> 245ns is very unlikely to result in visible performance difference when >> considering an application as a whole, so I think it's critical here to >> decide _which_ use cases to prioritize. >> >> I think a good outcome of this discussion would be if we could come to >> some shared understanding of which native calls are truly problematic (e.g. >> clock_gettime-like), and then for the JDK to provide better (and more >> maintainable) alternatives for those (which might even be faster than using >> critical JNI). >> >> Thanks >> Maurizio >> >> [1] - https://man7.org/linux/man-pages/man7/vdso.7.html >> >> On 04/07/2022 12:23, Wojciech Kudla wrote: >> >> Thanks Maurizio, >> >> I raised this case mainly about clock_gettime and recvmsg/sendmsg, I >> think we're focusing on the wrong things here. Feel free to drop the two >> syscalls from the discussion entirely, but the main usecases I have been >> presenting throughout this thread definitely stand. >> >> Thanks >> >> >> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < >> maurizio.cimadamore at oracle.com> wrote: >> >>> Hi Wojtek, >>> thanks for sharing this list, I think this is a good starting point to >>> understand more about your use case. >>> >>> Last week I've been looking at "getrusage" (as you mentioned it in an >>> earlier email), and I was surprised to see that the call took a pointer to >>> a (fairly big) struct which then needed to be initialized with some >>> thread-local state: >>> >>> https://man7.org/linux/man-pages/man2/getrusage.2.html >>> >>> >>> I've looked at the implementation, and it seems to be doing memset on >>> the user-provided struct pointer, plus all the fields assignment. >>> Eyeballing the implementation, this does not seem to me like a "classic" >>> use case where dropping transition would help much. I mean, surely dropping >>> transitions would help shaving some nanoseconds off the call, but it >>> doesn't seem to me that the call would be shortlived enough to make a >>> difference. Do you have some benchmarks on this one? I did some [1] and the >>> call overhead seemed to come up at 260ns/op - w/o transition you might >>> perhaps be able to get to 250ns, but that's in the noise? >>> >>> As for getpid, note that you can do (since Java 9): >>> >>> ProcessHandle.current().pid(); >>> >>> I believe the impl caches the result, so it shouldn't even make the >>> native call. >>> >>> Maurizio >>> >>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >>> On 02/07/2022 07:42, Wojciech Kudla wrote: >>> >>> Hi Maurizio, >>> >>> Thanks for staying on this. >>> >>> > Could you please provide a rough list of the native calls you make >>> where you believe critical JNI is having a real impact in the performance >>> of your application? >>> >>> From the top of my head: >>> clock_gettime >>> recvmsg >>> recvmmsg >>> sendmsg >>> sendmmsg >>> select >>> getpid >>> getcpu >>> getrusage >>> >>> > Also, could you please tell us whether any of these calls need to >>> interact with Java arrays? >>> No arrays or objects of any type involved. Everything happens by the >>> means of passing raw pointers as longs and using other primitive types as >>> function arguments. >>> >>> > In other words, do you use critical JNI to remove the cost associated >>> with thread transitions, or are you also taking advantage of accessing >>> on-heap memory _directly_ from native code? >>> Criticial JNI natives are used solely to remove the cost of transitions. >>> We don't get anywhere near java heap in native code. >>> >>> In general I think it makes a lot of sense for Java as a >>> language/platform to have some guards around unsafe code, but on the other >>> hand the popularity of libraries employing Unsafe and their success in more >>> performance-oriented corners of software engineering is a clear indicator >>> there is a need for the JVM to provide access to more low-level primitives >>> and mechanisms. >>> I think it's entirely fair to tell developers that all bets are off when >>> they get into some non-idiomatic scenarios but please don't take away a >>> feature that greatly contributed to Java's success. >>> >>> Kind regards, >>> Wojtek >>> >>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < >>> maurizio.cimadamore at oracle.com> wrote: >>> >>>> Hi Wojciech, >>>> picking up this thread again. After some internal discussion, we >>>> realize that we don't know enough about your use case. While re-enabling >>>> JNI critical would obviously provide a quick fix, we're afraid that (a) >>>> developers might end up depending on JNI critical when they don't need to >>>> (perhaps also unaware of the consequences of depending on it) and (b) that >>>> there might actually be _better_ (as in: much faster) solutions than using >>>> critical native calls to address at least some of your use cases (that >>>> seemed to be the case with the clock_gettime example you mentioned). Could >>>> you please provide a rough list of the native calls you make where you >>>> believe critical JNI is having a real impact in the performance of your >>>> application? Also, could you please tell us whether any of these calls need >>>> to interact with Java arrays? In other words, do you use critical JNI to >>>> remove the cost associated with thread transitions, or are you also taking >>>> advantage of accessing on-heap memory _directly_ from native code? >>>> >>>> Regards >>>> Maurizio >>>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>> >>>> Hi Mark, >>>> >>>> Thanks for your input and apologies for the delayed response. >>>> >>>> > If the platform included, say, an intrinsified System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> Exposing realtime clock with nanosecond granularity in the JDK would be >>>> a great step forward. I should have made it clear that I represent fintech >>>> corner (investment banking to be exact) but the issues my message touches >>>> upon span areas such as HPC, audio processing, gaming, and defense industry >>>> so it's not like we have an isolated case. >>>> >>>> > In a similar vein, if people are finding it necessary to ?replace >>>> parts >>>> of NIO with hand-crafted native code? then it would be interesting to >>>> understand what their requirements are >>>> >>>> As for the other example I provided with making very short lived >>>> syscalls such as recvmsg/recvmmsg the premise is getting access to hardware >>>> timestamps on the ingress and egress ends as well as enabling batch receive >>>> with a single syscall and otherwise exploiting features unavailable from >>>> the JDK (like access to CMSG interface, scatter/gather, etc). >>>> There are also other examples of calls that we'd love to make often and >>>> at lowest possible cost (ie. getrusage) but I'm not sure if there's a >>>> strong case for some of these ideas, that's why it might be worth looking >>>> into more generic approach for performance sensitive code. >>>> Hope this does better job at explaining where we're coming from than my >>>> previous messages. >>>> >>>> Thanks, >>>> W >>>> >>>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>> >>>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>>> >> CLOCK_REALTIME. >>>>> > >>>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>>> > granularity which is the reason why our industry has to resort to >>>>> > clock_gettime. >>>>> >>>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>> that help developers in your unnamed industry? >>>>> >>>>> In a similar vein, if people are finding it necessary to ?replace parts >>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>> understand what their requirements are. Some simple enhancements to >>>>> the NIO API would be much less costly to design and implement than a >>>>> generalized user-level native-call intrinsification mechanism. >>>>> >>>>> - Mark >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From alanb at openjdk.org Mon Jul 4 15:02:52 2022 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 4 Jul 2022 15:02:52 GMT Subject: RFR: 8288971: AArch64: Clean up stack and register handling in interpreter [v4] In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 15:14:29 GMT, Andrew Haley wrote: >> There are several places in the interpreter that could be improved. >> >> 1. We use r13 to pass the caller's SP to a callee through adapters. r13 is not a callee-saved register in the native ABI, so this causes some complications. Use a callee-saved register. >> 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. >> 3. Related to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Update templateInterpreterGenerator_aarch64.cpp Most of the JVM TI tests for virtual threads are now failing on aarch64, they are hitting this assert # Internal Error (/workspace/open/src/hotspot/share/runtime/thread.hpp:480), pid=1448788, tid=1448821 # assert(stack_base() > limit && limit >= stack_end()) failed: limit is outside of stack : V [libjvm.so+0x17b1af0] StackOverflow::enable_stack_reserved_zone(bool)+0x5c V [libjvm.so+0x174b688] SharedRuntime::enable_stack_reserved_zone(JavaThread*)+0x44 j jdk.internal.vm.Continuation.onContinue()V+0 java.base at 20-ea j jdk.internal.vm.Continuation.yield0(Ljdk/internal/vm/ContinuationScope;Ljdk/internal/vm/Continuation;)Z+317 java.base at 20-ea j jdk.internal.vm.Continuation.yield(Ljdk/internal/vm/ContinuationScope;)Z+69 java.base at 20-ea J 310 c1 java.lang.VirtualThread.yieldContinuation()Z java.base at 20-ea (55 bytes) @ 0x0000ffff9d0503b8 [0x0000ffff9d050200+0x00000000000001b8] J 155 jdk.internal.vm.Continuation.enterSpecial(Ljdk/internal/vm/Continuation;ZZ)V java.base at 20-ea (0 bytes) @ 0x0000ffffa4aaf738 [0x0000ffffa4aaf6c0+0x0000000000000078] J 326 c1 jdk.internal.vm.Continuation.run()V java.base at 20-ea (586 bytes) @ 0x0000ffff9d05f8e8 [0x0000ffff9d05f040+0x00000000000008a8] J 324 c1 java.lang.VirtualThread.runContinuation()V java.base at 20-ea (135 bytes) @ 0x0000ffff9d05daf8 [0x0000ffff9d05d5c0+0x0000000000000538] J 323 c1 java.lang.VirtualThread$$Lambda$8+0x000000080104b138.run()V java.base at 20-ea (8 bytes) @ 0x0000ffff9d05d07c [0x0000ffff9d05cf80+0x00000000000000fc] J 321 c1 java.util.concurrent.ForkJoinTask.doExec()I java.base at 20-ea (37 bytes) @ 0x0000ffff9d05b094 [0x0000ffff9d05ae80+0x0000000000000214] J 299 c1 java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Ljava/util/concurrent/ForkJoinTask;Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V java.base at 20-ea (83 bytes) @ 0x0000ffff9d046f0c [0x0000ffff9d046dc0+0x000000000000014c] J 214 c1 java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;II)I java.base at 20-ea (250 bytes) @ 0x0000ffff9d01beac [0x0000ffff9d01b840+0x000000000000066c] j java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+35 java.base at 20-ea j java.util.concurrent.ForkJoinWorkerThread.run()V+31 java.base at 20-ea v ~StubRoutines::call_stub 0x0000ffffa45001bc V [libjvm.so+0xf8242c] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x5ac V [libjvm.so+0xf82a88] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0x3e8 V [libjvm.so+0xf82e04] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x70 V [libjvm.so+0x10fbdb8] thread_entry(JavaThread*, JavaThread*)+0x118 V [libjvm.so+0xfb8bf4] JavaThread::thread_main_inner()+0x250 V [libjvm.so+0x18d0ac8] Thread::call_run()+0xf8 V [libjvm.so+0x15db2e4] thread_native_entry(Thread*)+0x104 C [libpthread.so.0+0x78f8] start_thread+0x188 ------------- PR: https://git.openjdk.org/jdk/pull/9239 From aph at openjdk.org Mon Jul 4 15:02:53 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 4 Jul 2022 15:02:53 GMT Subject: RFR: 8288971: AArch64: Clean up stack and register handling in interpreter [v4] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 14:58:14 GMT, Alan Bateman wrote: > Most of the JVM TI tests for virtual threads are now failing on aarch64, they are hitting this assert OK, sorry. Can you tell me the name of one of them? ------------- PR: https://git.openjdk.org/jdk/pull/9239 From alanb at openjdk.org Mon Jul 4 15:18:42 2022 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 4 Jul 2022 15:18:42 GMT Subject: RFR: 8288971: AArch64: Clean up stack and register handling in interpreter [v4] In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 15:14:29 GMT, Andrew Haley wrote: >> There are several places in the interpreter that could be improved. >> >> 1. We use r13 to pass the caller's SP to a callee through adapters. r13 is not a callee-saved register in the native ABI, so this causes some complications. Use a callee-saved register. >> 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. >> 3. Related to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Update templateInterpreterGenerator_aarch64.cpp tier1 or run hotspot/jtreg:jdk_loom. I think most of the jdk/jdk_loom tests will fail too, but for other reasons. ------------- PR: https://git.openjdk.org/jdk/pull/9239 From aph at openjdk.org Mon Jul 4 15:26:41 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 4 Jul 2022 15:26:41 GMT Subject: RFR: 8288971: AArch64: Clean up stack and register handling in interpreter [v4] In-Reply-To: References: Message-ID: <0N5feKUmymeWGs36wq6P3lJmQ0_Ch5xHxjq-bRuBbqw=.5128b631-6653-4ea6-9098-caa4eb8207ff@github.com> On Mon, 4 Jul 2022 15:15:31 GMT, Alan Bateman wrote: > tier1 or run hotspot/jtreg:jdk_loom. I think most of the jdk/jdk_loom tests will fail too, but for other reasons. OK, I think I know what that is. I'm on it. ------------- PR: https://git.openjdk.org/jdk/pull/9239 From mgronlun at openjdk.org Mon Jul 4 15:38:39 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 4 Jul 2022 15:38:39 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 09:07:36 GMT, Matthias Baesken wrote: >> Looks good and simple > >> Looks good and simple > > Hi, thanks for the review ! May I have a second review ? Hi @MBaesken, perhaps we should take a larger view of this functionality and incorporate it into the SweepCodeCache event. I don't see any information provided that would reflect on overall CodeCache memory before vs after in relation to the sweeper? There are only counts. What if we extend the event SweepCodeCache with fields to reflect "memory before sweep", "memory after sweep" and a boolean "compiler restart". In addition, there are metadata aspects that need to be addressed, like for example, this is not a durational event, and since it is issued only by the Sweeper thread it will not have a stack trace etc. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From egahlin at openjdk.org Mon Jul 4 15:45:42 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Mon, 4 Jul 2022 15:45:42 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. A memory before or after field should have the contentType="bytes" and we might as well use ulong as data type. There is no additional cost since data is stored using compressed integers. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From aph at openjdk.org Mon Jul 4 15:47:41 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 4 Jul 2022 15:47:41 GMT Subject: RFR: 8288971: AArch64: Clean up stack and register handling in interpreter [v4] In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 15:14:29 GMT, Andrew Haley wrote: >> There are several places in the interpreter that could be improved. >> >> 1. We use r13 to pass the caller's SP to a callee through adapters. r13 is not a callee-saved register in the native ABI, so this causes some complications. Use a callee-saved register. >> 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. >> 3. Related to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Update templateInterpreterGenerator_aarch64.cpp On 7/4/22 16:15, Alan Bateman wrote: > tier1 or run hotspot/jtreg:jdk_loom. I think most of the jdk/jdk_loom tests will fail too, but for other reasons. https://github.com/openjdk/jdk/pull/9367 Tests are running now. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 ------------- PR: https://git.openjdk.org/jdk/pull/9239 From mgronlun at openjdk.org Mon Jul 4 15:49:38 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 4 Jul 2022 15:49:38 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: <4xtfeIRXO1aGcppyAPaxRbGWKqWWWszGwiRSlWijVnE=.4845b072-6989-40a3-81d2-d5cccd3b11ab@github.com> On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Thanks for clarifying Erik, I forgot to mention the contentType. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From vitalyd at gmail.com Mon Jul 4 16:50:59 2022 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 4 Jul 2022 12:50:59 -0400 Subject: Obsoleting JavaCritical In-Reply-To: <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: I?d add rdtsc(p) wrapper functions to the list. These are usually either inline asm or compiler intrinsic in the JNI entrypoint. In addition, any native libs exposed via JNI that have ?trivial? functions are also candidates for faster calling conventions. There?re sometimes way to mitigate the call overhead (eg batching) but it?s not always feasible. I?ll add that last time I tried to measure the improvement of Java criticals for clock_gettime (and rdtsc) it looked to be in the noise on the hardware I was testing on. It got the point where I had to instrument the critical and normal JNI entrypoints to confirm the critical was being hit. The critical calling convention isn?t significantly different *if* basic primitives (or no args at all) are passed as args. JNIEnv*, IIRC, is loaded from a register so that?s minor. jclass (for static calls, which is what?s relevant here) should be a compiled constant. Critical call still has a GCLocker check. So I?m not actually sure what the significant difference is for ?lightweight? (ie few primitive or no args, primitive return types) calls. In general, I do think it?d be nice if there was a faster native call sequence, even if it comes with a caveat emptor and/or special requirements on the callee (not unlike the requirements for criticals). I think Vladimir Ivanov was working on ?snippets? that allowed dynamic construction of a native call, possibly including assembly. Not sure where that exploration is these days, but that would be a welcome capability. My $.02. Happy 4th of July for those celebrating! Vitaly On Mon, Jul 4, 2022 at 12:04 PM Maurizio Cimadamore < maurizio.cimadamore at oracle.com> wrote: > Hi, > while I'm not an expert with some of the IO calls you mention (some of my > colleagues are more knowledgeable in this area, so I'm sure they will have > more info), my general sense is that, as with getrusage, if there is a > system call involved, you already pay a hefty price for the user to kernel > transition. On my machine this seem to cost around 200ns. In these cases, > using JNI critical to shave off a dozen of nanoseconds (at best!) seems > just not worth it. > > So, of the functions in your list, the ones in which I *believe* dropping > transitions would have the most effect are (if we exclude getpid, for which > another approach is possible) clock_gettime and getcpu, I believe, as they > might use vdso [1], which typically brings the performance of these call > closer to calls to shared lib functions. > > If you have examples e.g. where performance of recvmsg (or related calls) > varies significantly between base JNI and critical JNI, please send them > our way; I'm sure some of my colleagues would be intersted to take a look. > > Popping back a couple of levels, I think it would be helpful to also > define what's an acceptable regression in this context. Of course, in an > ideal world, we'd like to see no performance regression at all. But JNI > critical is an unsupported interface, which might misbehave with modern > garbage collectors (e.g. ZGC) and that requires quite a bit of internal > complexity which might, in the medium/long run, hinder the evolution of the > Java platform (all these things have _some_ cost, even if the cost is not > directly material to developers). In this vein, I think calls like > clock_gettime tend to be more problematic: as they complete very quickly, > you see the cost of transitions a lot more. In other cases, where syscalls > are involved, the cost associated to transitions are more likely to be "in > the noise". Of course if we look at absolute numbers, dropping transitions > would always yield "faster" code; but at the same time, going from 250ns to > 245ns is very unlikely to result in visible performance difference when > considering an application as a whole, so I think it's critical here to > decide _which_ use cases to prioritize. > > I think a good outcome of this discussion would be if we could come to > some shared understanding of which native calls are truly problematic (e.g. > clock_gettime-like), and then for the JDK to provide better (and more > maintainable) alternatives for those (which might even be faster than using > critical JNI). > > Thanks > Maurizio > > [1] - https://man7.org/linux/man-pages/man7/vdso.7.html > On 04/07/2022 12:23, Wojciech Kudla wrote: > > Thanks Maurizio, > > I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think > we're focusing on the wrong things here. Feel free to drop the two syscalls > from the discussion entirely, but the main usecases I have been presenting > throughout this thread definitely stand. > > Thanks > > > On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < > maurizio.cimadamore at oracle.com> wrote: > >> Hi Wojtek, >> thanks for sharing this list, I think this is a good starting point to >> understand more about your use case. >> >> Last week I've been looking at "getrusage" (as you mentioned it in an >> earlier email), and I was surprised to see that the call took a pointer to >> a (fairly big) struct which then needed to be initialized with some >> thread-local state: >> >> https://man7.org/linux/man-pages/man2/getrusage.2.html >> >> I've looked at the implementation, and it seems to be doing memset on the >> user-provided struct pointer, plus all the fields assignment. Eyeballing >> the implementation, this does not seem to me like a "classic" use case >> where dropping transition would help much. I mean, surely dropping >> transitions would help shaving some nanoseconds off the call, but it >> doesn't seem to me that the call would be shortlived enough to make a >> difference. Do you have some benchmarks on this one? I did some [1] and the >> call overhead seemed to come up at 260ns/op - w/o transition you might >> perhaps be able to get to 250ns, but that's in the noise? >> >> As for getpid, note that you can do (since Java 9): >> >> ProcessHandle.current().pid(); >> >> I believe the impl caches the result, so it shouldn't even make the >> native call. >> >> Maurizio >> >> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >> On 02/07/2022 07:42, Wojciech Kudla wrote: >> >> Hi Maurizio, >> >> Thanks for staying on this. >> >> > Could you please provide a rough list of the native calls you make >> where you believe critical JNI is having a real impact in the performance >> of your application? >> >> From the top of my head: >> clock_gettime >> recvmsg >> recvmmsg >> sendmsg >> sendmmsg >> select >> getpid >> getcpu >> getrusage >> >> > Also, could you please tell us whether any of these calls need to >> interact with Java arrays? >> No arrays or objects of any type involved. Everything happens by the >> means of passing raw pointers as longs and using other primitive types as >> function arguments. >> >> > In other words, do you use critical JNI to remove the cost associated >> with thread transitions, or are you also taking advantage of accessing >> on-heap memory _directly_ from native code? >> Criticial JNI natives are used solely to remove the cost of transitions. >> We don't get anywhere near java heap in native code. >> >> In general I think it makes a lot of sense for Java as a >> language/platform to have some guards around unsafe code, but on the other >> hand the popularity of libraries employing Unsafe and their success in more >> performance-oriented corners of software engineering is a clear indicator >> there is a need for the JVM to provide access to more low-level primitives >> and mechanisms. >> I think it's entirely fair to tell developers that all bets are off when >> they get into some non-idiomatic scenarios but please don't take away a >> feature that greatly contributed to Java's success. >> >> Kind regards, >> Wojtek >> >> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < >> maurizio.cimadamore at oracle.com> wrote: >> >>> Hi Wojciech, >>> picking up this thread again. After some internal discussion, we realize >>> that we don't know enough about your use case. While re-enabling JNI >>> critical would obviously provide a quick fix, we're afraid that (a) >>> developers might end up depending on JNI critical when they don't need to >>> (perhaps also unaware of the consequences of depending on it) and (b) that >>> there might actually be _better_ (as in: much faster) solutions than using >>> critical native calls to address at least some of your use cases (that >>> seemed to be the case with the clock_gettime example you mentioned). Could >>> you please provide a rough list of the native calls you make where you >>> believe critical JNI is having a real impact in the performance of your >>> application? Also, could you please tell us whether any of these calls need >>> to interact with Java arrays? In other words, do you use critical JNI to >>> remove the cost associated with thread transitions, or are you also taking >>> advantage of accessing on-heap memory _directly_ from native code? >>> >>> Regards >>> Maurizio >>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>> >>> Hi Mark, >>> >>> Thanks for your input and apologies for the delayed response. >>> >>> > If the platform included, say, an intrinsified System.nanoRealTime() >>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>> that help developers in your unnamed industry? >>> >>> Exposing realtime clock with nanosecond granularity in the JDK would be >>> a great step forward. I should have made it clear that I represent fintech >>> corner (investment banking to be exact) but the issues my message touches >>> upon span areas such as HPC, audio processing, gaming, and defense industry >>> so it's not like we have an isolated case. >>> >>> > In a similar vein, if people are finding it necessary to ?replace parts >>> of NIO with hand-crafted native code? then it would be interesting to >>> understand what their requirements are >>> >>> As for the other example I provided with making very short lived >>> syscalls such as recvmsg/recvmmsg the premise is getting access to hardware >>> timestamps on the ingress and egress ends as well as enabling batch receive >>> with a single syscall and otherwise exploiting features unavailable from >>> the JDK (like access to CMSG interface, scatter/gather, etc). >>> There are also other examples of calls that we'd love to make often and >>> at lowest possible cost (ie. getrusage) but I'm not sure if there's a >>> strong case for some of these ideas, that's why it might be worth looking >>> into more generic approach for performance sensitive code. >>> Hope this does better job at explaining where we're coming from than my >>> previous messages. >>> >>> Thanks, >>> W >>> >>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>> >>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>> >> CLOCK_REALTIME. >>>> > >>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>> > granularity which is the reason why our industry has to resort to >>>> > clock_gettime. >>>> >>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> In a similar vein, if people are finding it necessary to ?replace parts >>>> of NIO with hand-crafted native code? then it would be interesting to >>>> understand what their requirements are. Some simple enhancements to >>>> the NIO API would be much less costly to design and implement than a >>>> generalized user-level native-call intrinsification mechanism. >>>> >>>> - Mark >>>> >>> -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkudla.kernel at gmail.com Mon Jul 4 17:12:52 2022 From: wkudla.kernel at gmail.com (Wojciech Kudla) Date: Mon, 4 Jul 2022 18:12:52 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: Thanks for your input, Vitaly. I'd be interested to find out more about the nature of the HW noise you observed in your benchmarks as our results were very consistent and it was pretty straightforward to pinpoint the culprit as JNI call overhead. Maybe it was just easier for us because we disallow C- and P-state transitions and put a lot of effort to eliminate platform jitter in general. Were you maybe running on a CPU model that doesn't support constant TSC? I would also suggest retrying with LAPIC interrupts suppressed (with: cli/sti) to maybe see if it's the kernel and not the hardware. 100% agree on rdtsc(p) and snippets. There are some narrow usecases were one can get some substantial speed ups with direct access to prefetch or by abusing misprediction to keep icache hot. These scenarios are sadly only available with inline assembly. I know of a few shops that go to the length of forking Graal, etc to achieve that but am quite convinced such capabilities would be welcome and utilized by many more groups if they were easily accessible from java. Thanks, W. On Mon, Jul 4, 2022 at 5:51 PM Vitaly Davidovich wrote: > I?d add rdtsc(p) wrapper functions to the list. These are usually either > inline asm or compiler intrinsic in the JNI entrypoint. In addition, any > native libs exposed via JNI that have ?trivial? functions are also > candidates for faster calling conventions. There?re sometimes way to > mitigate the call overhead (eg batching) but it?s not always feasible. > > I?ll add that last time I tried to measure the improvement of Java > criticals for clock_gettime (and rdtsc) it looked to be in the noise on the > hardware I was testing on. It got the point where I had to instrument the > critical and normal JNI entrypoints to confirm the critical was being hit. > The critical calling convention isn?t significantly different *if* basic > primitives (or no args at all) are passed as args. JNIEnv*, IIRC, is > loaded from a register so that?s minor. jclass (for static calls, which is > what?s relevant here) should be a compiled constant. Critical call still > has a GCLocker check. So I?m not actually sure what the significant > difference is for ?lightweight? (ie few primitive or no args, primitive > return types) calls. > > In general, I do think it?d be nice if there was a faster native call > sequence, even if it comes with a caveat emptor and/or special requirements > on the callee (not unlike the requirements for criticals). I think > Vladimir Ivanov was working on ?snippets? that allowed dynamic construction > of a native call, possibly including assembly. Not sure where that > exploration is these days, but that would be a welcome capability. > > My $.02. Happy 4th of July for those celebrating! > > Vitaly > > On Mon, Jul 4, 2022 at 12:04 PM Maurizio Cimadamore < > maurizio.cimadamore at oracle.com> wrote: > >> Hi, >> while I'm not an expert with some of the IO calls you mention (some of my >> colleagues are more knowledgeable in this area, so I'm sure they will have >> more info), my general sense is that, as with getrusage, if there is a >> system call involved, you already pay a hefty price for the user to kernel >> transition. On my machine this seem to cost around 200ns. In these cases, >> using JNI critical to shave off a dozen of nanoseconds (at best!) seems >> just not worth it. >> >> So, of the functions in your list, the ones in which I *believe* >> dropping transitions would have the most effect are (if we exclude getpid, >> for which another approach is possible) clock_gettime and getcpu, I >> believe, as they might use vdso [1], which typically brings the performance >> of these call closer to calls to shared lib functions. >> >> If you have examples e.g. where performance of recvmsg (or related calls) >> varies significantly between base JNI and critical JNI, please send them >> our way; I'm sure some of my colleagues would be intersted to take a look. >> >> Popping back a couple of levels, I think it would be helpful to also >> define what's an acceptable regression in this context. Of course, in an >> ideal world, we'd like to see no performance regression at all. But JNI >> critical is an unsupported interface, which might misbehave with modern >> garbage collectors (e.g. ZGC) and that requires quite a bit of internal >> complexity which might, in the medium/long run, hinder the evolution of the >> Java platform (all these things have _some_ cost, even if the cost is not >> directly material to developers). In this vein, I think calls like >> clock_gettime tend to be more problematic: as they complete very quickly, >> you see the cost of transitions a lot more. In other cases, where syscalls >> are involved, the cost associated to transitions are more likely to be "in >> the noise". Of course if we look at absolute numbers, dropping transitions >> would always yield "faster" code; but at the same time, going from 250ns to >> 245ns is very unlikely to result in visible performance difference when >> considering an application as a whole, so I think it's critical here to >> decide _which_ use cases to prioritize. >> >> I think a good outcome of this discussion would be if we could come to >> some shared understanding of which native calls are truly problematic (e.g. >> clock_gettime-like), and then for the JDK to provide better (and more >> maintainable) alternatives for those (which might even be faster than using >> critical JNI). >> >> Thanks >> Maurizio >> >> [1] - https://man7.org/linux/man-pages/man7/vdso.7.html >> On 04/07/2022 12:23, Wojciech Kudla wrote: >> >> Thanks Maurizio, >> >> I raised this case mainly about clock_gettime and recvmsg/sendmsg, I >> think we're focusing on the wrong things here. Feel free to drop the two >> syscalls from the discussion entirely, but the main usecases I have been >> presenting throughout this thread definitely stand. >> >> Thanks >> >> >> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < >> maurizio.cimadamore at oracle.com> wrote: >> >>> Hi Wojtek, >>> thanks for sharing this list, I think this is a good starting point to >>> understand more about your use case. >>> >>> Last week I've been looking at "getrusage" (as you mentioned it in an >>> earlier email), and I was surprised to see that the call took a pointer to >>> a (fairly big) struct which then needed to be initialized with some >>> thread-local state: >>> >>> https://man7.org/linux/man-pages/man2/getrusage.2.html >>> >>> I've looked at the implementation, and it seems to be doing memset on >>> the user-provided struct pointer, plus all the fields assignment. >>> Eyeballing the implementation, this does not seem to me like a "classic" >>> use case where dropping transition would help much. I mean, surely dropping >>> transitions would help shaving some nanoseconds off the call, but it >>> doesn't seem to me that the call would be shortlived enough to make a >>> difference. Do you have some benchmarks on this one? I did some [1] and the >>> call overhead seemed to come up at 260ns/op - w/o transition you might >>> perhaps be able to get to 250ns, but that's in the noise? >>> >>> As for getpid, note that you can do (since Java 9): >>> >>> ProcessHandle.current().pid(); >>> >>> I believe the impl caches the result, so it shouldn't even make the >>> native call. >>> >>> Maurizio >>> >>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >>> On 02/07/2022 07:42, Wojciech Kudla wrote: >>> >>> Hi Maurizio, >>> >>> Thanks for staying on this. >>> >>> > Could you please provide a rough list of the native calls you make >>> where you believe critical JNI is having a real impact in the performance >>> of your application? >>> >>> From the top of my head: >>> clock_gettime >>> recvmsg >>> recvmmsg >>> sendmsg >>> sendmmsg >>> select >>> getpid >>> getcpu >>> getrusage >>> >>> > Also, could you please tell us whether any of these calls need to >>> interact with Java arrays? >>> No arrays or objects of any type involved. Everything happens by the >>> means of passing raw pointers as longs and using other primitive types as >>> function arguments. >>> >>> > In other words, do you use critical JNI to remove the cost associated >>> with thread transitions, or are you also taking advantage of accessing >>> on-heap memory _directly_ from native code? >>> Criticial JNI natives are used solely to remove the cost of transitions. >>> We don't get anywhere near java heap in native code. >>> >>> In general I think it makes a lot of sense for Java as a >>> language/platform to have some guards around unsafe code, but on the other >>> hand the popularity of libraries employing Unsafe and their success in more >>> performance-oriented corners of software engineering is a clear indicator >>> there is a need for the JVM to provide access to more low-level primitives >>> and mechanisms. >>> I think it's entirely fair to tell developers that all bets are off when >>> they get into some non-idiomatic scenarios but please don't take away a >>> feature that greatly contributed to Java's success. >>> >>> Kind regards, >>> Wojtek >>> >>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < >>> maurizio.cimadamore at oracle.com> wrote: >>> >>>> Hi Wojciech, >>>> picking up this thread again. After some internal discussion, we >>>> realize that we don't know enough about your use case. While re-enabling >>>> JNI critical would obviously provide a quick fix, we're afraid that (a) >>>> developers might end up depending on JNI critical when they don't need to >>>> (perhaps also unaware of the consequences of depending on it) and (b) that >>>> there might actually be _better_ (as in: much faster) solutions than using >>>> critical native calls to address at least some of your use cases (that >>>> seemed to be the case with the clock_gettime example you mentioned). Could >>>> you please provide a rough list of the native calls you make where you >>>> believe critical JNI is having a real impact in the performance of your >>>> application? Also, could you please tell us whether any of these calls need >>>> to interact with Java arrays? In other words, do you use critical JNI to >>>> remove the cost associated with thread transitions, or are you also taking >>>> advantage of accessing on-heap memory _directly_ from native code? >>>> >>>> Regards >>>> Maurizio >>>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>> >>>> Hi Mark, >>>> >>>> Thanks for your input and apologies for the delayed response. >>>> >>>> > If the platform included, say, an intrinsified System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> Exposing realtime clock with nanosecond granularity in the JDK would be >>>> a great step forward. I should have made it clear that I represent fintech >>>> corner (investment banking to be exact) but the issues my message touches >>>> upon span areas such as HPC, audio processing, gaming, and defense industry >>>> so it's not like we have an isolated case. >>>> >>>> > In a similar vein, if people are finding it necessary to ?replace >>>> parts >>>> of NIO with hand-crafted native code? then it would be interesting to >>>> understand what their requirements are >>>> >>>> As for the other example I provided with making very short lived >>>> syscalls such as recvmsg/recvmmsg the premise is getting access to hardware >>>> timestamps on the ingress and egress ends as well as enabling batch receive >>>> with a single syscall and otherwise exploiting features unavailable from >>>> the JDK (like access to CMSG interface, scatter/gather, etc). >>>> There are also other examples of calls that we'd love to make often and >>>> at lowest possible cost (ie. getrusage) but I'm not sure if there's a >>>> strong case for some of these ideas, that's why it might be worth looking >>>> into more generic approach for performance sensitive code. >>>> Hope this does better job at explaining where we're coming from than my >>>> previous messages. >>>> >>>> Thanks, >>>> W >>>> >>>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>> >>>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>>> >> CLOCK_REALTIME. >>>>> > >>>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>>> > granularity which is the reason why our industry has to resort to >>>>> > clock_gettime. >>>>> >>>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>> that help developers in your unnamed industry? >>>>> >>>>> In a similar vein, if people are finding it necessary to ?replace parts >>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>> understand what their requirements are. Some simple enhancements to >>>>> the NIO API would be much less costly to design and implement than a >>>>> generalized user-level native-call intrinsification mechanism. >>>>> >>>>> - Mark >>>>> >>>> -- > Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Jul 4 17:26:16 2022 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 4 Jul 2022 13:26:16 -0400 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: On Mon, Jul 4, 2022 at 1:13 PM Wojciech Kudla wrote: > Thanks for your input, Vitaly. I'd be interested to find out more about > the nature of the HW noise you observed in your benchmarks as our results > were very consistent and it was pretty straightforward to pinpoint the > culprit as JNI call overhead. Maybe it was just easier for us because we > disallow C- and P-state transitions and put a lot of effort to eliminate > platform jitter in general. Were you maybe running on a CPU model that > doesn't support constant TSC? I would also suggest retrying with LAPIC > interrupts suppressed (with: cli/sti) to maybe see if it's the kernel and > not the hardware. > This was on a Broadwell Xeon chipset with constant tsc. All the typical jitter sources were reduced: C/P states disabled in bios, max turbo enabled, IRQs steered away, core isolated, etc. By the way, by noise I don?t mean the results themselves were noisy - they were constant run to run. I just meant the delta between normal vs critical JNI entrypoints was very minimal - ie ?in the noise?, particularly with rdtsc. I can try to remeasure on newer Intel but see below ? > > > 100% agree on rdtsc(p) and snippets. There are some narrow usecases were > one can get some substantial speed ups with direct access to prefetch or by > abusing misprediction to keep icache hot. These scenarios are sadly only > available with inline assembly. I know of a few shops that go to the length > of forking Graal, etc to achieve that but am quite convinced such > capabilities would be welcome and utilized by many more groups if they were > easily accessible from java. > I?m of the firm (and perhaps controversial for some :)) opinion these days that Java is simply the wrong platform/tool for low latency cases that warrant this level of control. There?re very strong headwinds even outside of JNI costs. And the ?real? problem with JNI, besides transition costs, is lack of inlining into the native calls. So even if JVM transition costs are fully eliminated, there?s still an optimization fence due to lost inlining (not unlike native code calling native fns via shared libs). That?s not say that perf regressions are welcomed - nobody likes those :). > > > Thanks, > W. > > On Mon, Jul 4, 2022 at 5:51 PM Vitaly Davidovich > wrote: > >> I?d add rdtsc(p) wrapper functions to the list. These are usually either >> inline asm or compiler intrinsic in the JNI entrypoint. In addition, any >> native libs exposed via JNI that have ?trivial? functions are also >> candidates for faster calling conventions. There?re sometimes way to >> mitigate the call overhead (eg batching) but it?s not always feasible. >> >> I?ll add that last time I tried to measure the improvement of Java >> criticals for clock_gettime (and rdtsc) it looked to be in the noise on the >> hardware I was testing on. It got the point where I had to instrument the >> critical and normal JNI entrypoints to confirm the critical was being hit. >> The critical calling convention isn?t significantly different *if* basic >> primitives (or no args at all) are passed as args. JNIEnv*, IIRC, is >> loaded from a register so that?s minor. jclass (for static calls, which is >> what?s relevant here) should be a compiled constant. Critical call still >> has a GCLocker check. So I?m not actually sure what the significant >> difference is for ?lightweight? (ie few primitive or no args, primitive >> return types) calls. >> >> In general, I do think it?d be nice if there was a faster native call >> sequence, even if it comes with a caveat emptor and/or special requirements >> on the callee (not unlike the requirements for criticals). I think >> Vladimir Ivanov was working on ?snippets? that allowed dynamic construction >> of a native call, possibly including assembly. Not sure where that >> exploration is these days, but that would be a welcome capability. >> >> My $.02. Happy 4th of July for those celebrating! >> >> Vitaly >> >> On Mon, Jul 4, 2022 at 12:04 PM Maurizio Cimadamore < >> maurizio.cimadamore at oracle.com> wrote: >> >>> Hi, >>> while I'm not an expert with some of the IO calls you mention (some of >>> my colleagues are more knowledgeable in this area, so I'm sure they will >>> have more info), my general sense is that, as with getrusage, if there is a >>> system call involved, you already pay a hefty price for the user to kernel >>> transition. On my machine this seem to cost around 200ns. In these cases, >>> using JNI critical to shave off a dozen of nanoseconds (at best!) seems >>> just not worth it. >>> >>> So, of the functions in your list, the ones in which I *believe* >>> dropping transitions would have the most effect are (if we exclude getpid, >>> for which another approach is possible) clock_gettime and getcpu, I >>> believe, as they might use vdso [1], which typically brings the performance >>> of these call closer to calls to shared lib functions. >>> >>> If you have examples e.g. where performance of recvmsg (or related >>> calls) varies significantly between base JNI and critical JNI, please send >>> them our way; I'm sure some of my colleagues would be intersted to take a >>> look. >>> >>> Popping back a couple of levels, I think it would be helpful to also >>> define what's an acceptable regression in this context. Of course, in an >>> ideal world, we'd like to see no performance regression at all. But JNI >>> critical is an unsupported interface, which might misbehave with modern >>> garbage collectors (e.g. ZGC) and that requires quite a bit of internal >>> complexity which might, in the medium/long run, hinder the evolution of the >>> Java platform (all these things have _some_ cost, even if the cost is not >>> directly material to developers). In this vein, I think calls like >>> clock_gettime tend to be more problematic: as they complete very quickly, >>> you see the cost of transitions a lot more. In other cases, where syscalls >>> are involved, the cost associated to transitions are more likely to be "in >>> the noise". Of course if we look at absolute numbers, dropping transitions >>> would always yield "faster" code; but at the same time, going from 250ns to >>> 245ns is very unlikely to result in visible performance difference when >>> considering an application as a whole, so I think it's critical here to >>> decide _which_ use cases to prioritize. >>> >>> I think a good outcome of this discussion would be if we could come to >>> some shared understanding of which native calls are truly problematic (e.g. >>> clock_gettime-like), and then for the JDK to provide better (and more >>> maintainable) alternatives for those (which might even be faster than using >>> critical JNI). >>> >>> Thanks >>> Maurizio >>> >>> [1] - https://man7.org/linux/man-pages/man7/vdso.7.html >>> On 04/07/2022 12:23, Wojciech Kudla wrote: >>> >>> Thanks Maurizio, >>> >>> I raised this case mainly about clock_gettime and recvmsg/sendmsg, I >>> think we're focusing on the wrong things here. Feel free to drop the two >>> syscalls from the discussion entirely, but the main usecases I have been >>> presenting throughout this thread definitely stand. >>> >>> Thanks >>> >>> >>> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < >>> maurizio.cimadamore at oracle.com> wrote: >>> >>>> Hi Wojtek, >>>> thanks for sharing this list, I think this is a good starting point to >>>> understand more about your use case. >>>> >>>> Last week I've been looking at "getrusage" (as you mentioned it in an >>>> earlier email), and I was surprised to see that the call took a pointer to >>>> a (fairly big) struct which then needed to be initialized with some >>>> thread-local state: >>>> >>>> https://man7.org/linux/man-pages/man2/getrusage.2.html >>>> >>>> I've looked at the implementation, and it seems to be doing memset on >>>> the user-provided struct pointer, plus all the fields assignment. >>>> Eyeballing the implementation, this does not seem to me like a "classic" >>>> use case where dropping transition would help much. I mean, surely dropping >>>> transitions would help shaving some nanoseconds off the call, but it >>>> doesn't seem to me that the call would be shortlived enough to make a >>>> difference. Do you have some benchmarks on this one? I did some [1] and the >>>> call overhead seemed to come up at 260ns/op - w/o transition you might >>>> perhaps be able to get to 250ns, but that's in the noise? >>>> >>>> As for getpid, note that you can do (since Java 9): >>>> >>>> ProcessHandle.current().pid(); >>>> >>>> I believe the impl caches the result, so it shouldn't even make the >>>> native call. >>>> >>>> Maurizio >>>> >>>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >>>> On 02/07/2022 07:42, Wojciech Kudla wrote: >>>> >>>> Hi Maurizio, >>>> >>>> Thanks for staying on this. >>>> >>>> > Could you please provide a rough list of the native calls you make >>>> where you believe critical JNI is having a real impact in the performance >>>> of your application? >>>> >>>> From the top of my head: >>>> clock_gettime >>>> recvmsg >>>> recvmmsg >>>> sendmsg >>>> sendmmsg >>>> select >>>> getpid >>>> getcpu >>>> getrusage >>>> >>>> > Also, could you please tell us whether any of these calls need to >>>> interact with Java arrays? >>>> No arrays or objects of any type involved. Everything happens by the >>>> means of passing raw pointers as longs and using other primitive types as >>>> function arguments. >>>> >>>> > In other words, do you use critical JNI to remove the cost associated >>>> with thread transitions, or are you also taking advantage of accessing >>>> on-heap memory _directly_ from native code? >>>> Criticial JNI natives are used solely to remove the cost of >>>> transitions. We don't get anywhere near java heap in native code. >>>> >>>> In general I think it makes a lot of sense for Java as a >>>> language/platform to have some guards around unsafe code, but on the other >>>> hand the popularity of libraries employing Unsafe and their success in more >>>> performance-oriented corners of software engineering is a clear indicator >>>> there is a need for the JVM to provide access to more low-level primitives >>>> and mechanisms. >>>> I think it's entirely fair to tell developers that all bets are off >>>> when they get into some non-idiomatic scenarios but please don't take away >>>> a feature that greatly contributed to Java's success. >>>> >>>> Kind regards, >>>> Wojtek >>>> >>>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < >>>> maurizio.cimadamore at oracle.com> wrote: >>>> >>>>> Hi Wojciech, >>>>> picking up this thread again. After some internal discussion, we >>>>> realize that we don't know enough about your use case. While re-enabling >>>>> JNI critical would obviously provide a quick fix, we're afraid that (a) >>>>> developers might end up depending on JNI critical when they don't need to >>>>> (perhaps also unaware of the consequences of depending on it) and (b) that >>>>> there might actually be _better_ (as in: much faster) solutions than using >>>>> critical native calls to address at least some of your use cases (that >>>>> seemed to be the case with the clock_gettime example you mentioned). Could >>>>> you please provide a rough list of the native calls you make where you >>>>> believe critical JNI is having a real impact in the performance of your >>>>> application? Also, could you please tell us whether any of these calls need >>>>> to interact with Java arrays? In other words, do you use critical JNI to >>>>> remove the cost associated with thread transitions, or are you also taking >>>>> advantage of accessing on-heap memory _directly_ from native code? >>>>> >>>>> Regards >>>>> Maurizio >>>>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>>> >>>>> Hi Mark, >>>>> >>>>> Thanks for your input and apologies for the delayed response. >>>>> >>>>> > If the platform included, say, an intrinsified System.nanoRealTime() >>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>> that help developers in your unnamed industry? >>>>> >>>>> Exposing realtime clock with nanosecond granularity in the JDK would >>>>> be a great step forward. I should have made it clear that I represent >>>>> fintech corner (investment banking to be exact) but the issues my message >>>>> touches upon span areas such as HPC, audio processing, gaming, and defense >>>>> industry so it's not like we have an isolated case. >>>>> >>>>> > In a similar vein, if people are finding it necessary to ?replace >>>>> parts >>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>> understand what their requirements are >>>>> >>>>> As for the other example I provided with making very short lived >>>>> syscalls such as recvmsg/recvmmsg the premise is getting access to hardware >>>>> timestamps on the ingress and egress ends as well as enabling batch receive >>>>> with a single syscall and otherwise exploiting features unavailable from >>>>> the JDK (like access to CMSG interface, scatter/gather, etc). >>>>> There are also other examples of calls that we'd love to make often >>>>> and at lowest possible cost (ie. getrusage) but I'm not sure if there's a >>>>> strong case for some of these ideas, that's why it might be worth looking >>>>> into more generic approach for performance sensitive code. >>>>> Hope this does better job at explaining where we're coming from than >>>>> my previous messages. >>>>> >>>>> Thanks, >>>>> W >>>>> >>>>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>>> >>>>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>>>> >> CLOCK_REALTIME. >>>>>> > >>>>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>>>> > granularity which is the reason why our industry has to resort to >>>>>> > clock_gettime. >>>>>> >>>>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>>> that help developers in your unnamed industry? >>>>>> >>>>>> In a similar vein, if people are finding it necessary to ?replace >>>>>> parts >>>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>>> understand what their requirements are. Some simple enhancements to >>>>>> the NIO API would be much less costly to design and implement than a >>>>>> generalized user-level native-call intrinsification mechanism. >>>>>> >>>>>> - Mark >>>>>> >>>>> -- >> Sent from my phone >> > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Jul 4 17:38:14 2022 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 4 Jul 2022 13:38:14 -0400 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: To not sidetrack this thread with my previous reply: Maurizio - are you saying java criticals are *already* hindering ZGC and/or other planned Hotspot improvements? Or that theoretically they could and you?d like to remove/deprecate them now(ish)? If it?s the former, perhaps it?s prudent to keep them around until a compelling case surfaces where they preclude or severely restrict evolution of the platform? If it?s the former, would be curious what that is but would also understand the rationale behind wanting to remove it. On Mon, Jul 4, 2022 at 1:26 PM Vitaly Davidovich wrote: > > > On Mon, Jul 4, 2022 at 1:13 PM Wojciech Kudla > wrote: > >> Thanks for your input, Vitaly. I'd be interested to find out more about >> the nature of the HW noise you observed in your benchmarks as our results >> were very consistent and it was pretty straightforward to pinpoint the >> culprit as JNI call overhead. Maybe it was just easier for us because we >> disallow C- and P-state transitions and put a lot of effort to eliminate >> platform jitter in general. Were you maybe running on a CPU model that >> doesn't support constant TSC? I would also suggest retrying with LAPIC >> interrupts suppressed (with: cli/sti) to maybe see if it's the kernel and >> not the hardware. >> > This was on a Broadwell Xeon chipset with constant tsc. All the typical > jitter sources were reduced: C/P states disabled in bios, max turbo > enabled, IRQs steered away, core isolated, etc. By the way, by noise I > don?t mean the results themselves were noisy - they were constant run to > run. I just meant the delta between normal vs critical JNI entrypoints was > very minimal - ie ?in the noise?, particularly with rdtsc. > > I can try to remeasure on newer Intel but see below ? > >> >> >> 100% agree on rdtsc(p) and snippets. There are some narrow usecases were >> one can get some substantial speed ups with direct access to prefetch or by >> abusing misprediction to keep icache hot. These scenarios are sadly only >> available with inline assembly. I know of a few shops that go to the length >> of forking Graal, etc to achieve that but am quite convinced such >> capabilities would be welcome and utilized by many more groups if they were >> easily accessible from java. >> > I?m of the firm (and perhaps controversial for some :)) opinion these days > that Java is simply the wrong platform/tool for low latency cases that > warrant this level of control. There?re very strong headwinds even outside > of JNI costs. And the ?real? problem with JNI, besides transition costs, > is lack of inlining into the native calls. So even if JVM transition costs > are fully eliminated, there?s still an optimization fence due to lost > inlining (not unlike native code calling native fns via shared libs). > > That?s not say that perf regressions are welcomed - nobody likes those :). > >> >> >> Thanks, >> W. >> >> On Mon, Jul 4, 2022 at 5:51 PM Vitaly Davidovich >> wrote: >> >>> I?d add rdtsc(p) wrapper functions to the list. These are usually >>> either inline asm or compiler intrinsic in the JNI entrypoint. In >>> addition, any native libs exposed via JNI that have ?trivial? functions are >>> also candidates for faster calling conventions. There?re sometimes way to >>> mitigate the call overhead (eg batching) but it?s not always feasible. >>> >>> I?ll add that last time I tried to measure the improvement of Java >>> criticals for clock_gettime (and rdtsc) it looked to be in the noise on the >>> hardware I was testing on. It got the point where I had to instrument the >>> critical and normal JNI entrypoints to confirm the critical was being hit. >>> The critical calling convention isn?t significantly different *if* basic >>> primitives (or no args at all) are passed as args. JNIEnv*, IIRC, is >>> loaded from a register so that?s minor. jclass (for static calls, which is >>> what?s relevant here) should be a compiled constant. Critical call still >>> has a GCLocker check. So I?m not actually sure what the significant >>> difference is for ?lightweight? (ie few primitive or no args, primitive >>> return types) calls. >>> >>> In general, I do think it?d be nice if there was a faster native call >>> sequence, even if it comes with a caveat emptor and/or special requirements >>> on the callee (not unlike the requirements for criticals). I think >>> Vladimir Ivanov was working on ?snippets? that allowed dynamic construction >>> of a native call, possibly including assembly. Not sure where that >>> exploration is these days, but that would be a welcome capability. >>> >>> My $.02. Happy 4th of July for those celebrating! >>> >>> Vitaly >>> >>> On Mon, Jul 4, 2022 at 12:04 PM Maurizio Cimadamore < >>> maurizio.cimadamore at oracle.com> wrote: >>> >>>> Hi, >>>> while I'm not an expert with some of the IO calls you mention (some of >>>> my colleagues are more knowledgeable in this area, so I'm sure they will >>>> have more info), my general sense is that, as with getrusage, if there is a >>>> system call involved, you already pay a hefty price for the user to kernel >>>> transition. On my machine this seem to cost around 200ns. In these cases, >>>> using JNI critical to shave off a dozen of nanoseconds (at best!) seems >>>> just not worth it. >>>> >>>> So, of the functions in your list, the ones in which I *believe* >>>> dropping transitions would have the most effect are (if we exclude getpid, >>>> for which another approach is possible) clock_gettime and getcpu, I >>>> believe, as they might use vdso [1], which typically brings the performance >>>> of these call closer to calls to shared lib functions. >>>> >>>> If you have examples e.g. where performance of recvmsg (or related >>>> calls) varies significantly between base JNI and critical JNI, please send >>>> them our way; I'm sure some of my colleagues would be intersted to take a >>>> look. >>>> >>>> Popping back a couple of levels, I think it would be helpful to also >>>> define what's an acceptable regression in this context. Of course, in an >>>> ideal world, we'd like to see no performance regression at all. But JNI >>>> critical is an unsupported interface, which might misbehave with modern >>>> garbage collectors (e.g. ZGC) and that requires quite a bit of internal >>>> complexity which might, in the medium/long run, hinder the evolution of the >>>> Java platform (all these things have _some_ cost, even if the cost is not >>>> directly material to developers). In this vein, I think calls like >>>> clock_gettime tend to be more problematic: as they complete very quickly, >>>> you see the cost of transitions a lot more. In other cases, where syscalls >>>> are involved, the cost associated to transitions are more likely to be "in >>>> the noise". Of course if we look at absolute numbers, dropping transitions >>>> would always yield "faster" code; but at the same time, going from 250ns to >>>> 245ns is very unlikely to result in visible performance difference when >>>> considering an application as a whole, so I think it's critical here to >>>> decide _which_ use cases to prioritize. >>>> >>>> I think a good outcome of this discussion would be if we could come to >>>> some shared understanding of which native calls are truly problematic (e.g. >>>> clock_gettime-like), and then for the JDK to provide better (and more >>>> maintainable) alternatives for those (which might even be faster than using >>>> critical JNI). >>>> >>>> Thanks >>>> Maurizio >>>> >>>> [1] - https://man7.org/linux/man-pages/man7/vdso.7.html >>>> On 04/07/2022 12:23, Wojciech Kudla wrote: >>>> >>>> Thanks Maurizio, >>>> >>>> I raised this case mainly about clock_gettime and recvmsg/sendmsg, I >>>> think we're focusing on the wrong things here. Feel free to drop the two >>>> syscalls from the discussion entirely, but the main usecases I have been >>>> presenting throughout this thread definitely stand. >>>> >>>> Thanks >>>> >>>> >>>> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < >>>> maurizio.cimadamore at oracle.com> wrote: >>>> >>>>> Hi Wojtek, >>>>> thanks for sharing this list, I think this is a good starting point to >>>>> understand more about your use case. >>>>> >>>>> Last week I've been looking at "getrusage" (as you mentioned it in an >>>>> earlier email), and I was surprised to see that the call took a pointer to >>>>> a (fairly big) struct which then needed to be initialized with some >>>>> thread-local state: >>>>> >>>>> https://man7.org/linux/man-pages/man2/getrusage.2.html >>>>> >>>>> I've looked at the implementation, and it seems to be doing memset on >>>>> the user-provided struct pointer, plus all the fields assignment. >>>>> Eyeballing the implementation, this does not seem to me like a "classic" >>>>> use case where dropping transition would help much. I mean, surely dropping >>>>> transitions would help shaving some nanoseconds off the call, but it >>>>> doesn't seem to me that the call would be shortlived enough to make a >>>>> difference. Do you have some benchmarks on this one? I did some [1] and the >>>>> call overhead seemed to come up at 260ns/op - w/o transition you might >>>>> perhaps be able to get to 250ns, but that's in the noise? >>>>> >>>>> As for getpid, note that you can do (since Java 9): >>>>> >>>>> ProcessHandle.current().pid(); >>>>> >>>>> I believe the impl caches the result, so it shouldn't even make the >>>>> native call. >>>>> >>>>> Maurizio >>>>> >>>>> [1] - >>>>> http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >>>>> On 02/07/2022 07:42, Wojciech Kudla wrote: >>>>> >>>>> Hi Maurizio, >>>>> >>>>> Thanks for staying on this. >>>>> >>>>> > Could you please provide a rough list of the native calls you make >>>>> where you believe critical JNI is having a real impact in the performance >>>>> of your application? >>>>> >>>>> From the top of my head: >>>>> clock_gettime >>>>> recvmsg >>>>> recvmmsg >>>>> sendmsg >>>>> sendmmsg >>>>> select >>>>> getpid >>>>> getcpu >>>>> getrusage >>>>> >>>>> > Also, could you please tell us whether any of these calls need to >>>>> interact with Java arrays? >>>>> No arrays or objects of any type involved. Everything happens by the >>>>> means of passing raw pointers as longs and using other primitive types as >>>>> function arguments. >>>>> >>>>> > In other words, do you use critical JNI to remove the cost >>>>> associated with thread transitions, or are you also taking advantage of >>>>> accessing on-heap memory _directly_ from native code? >>>>> Criticial JNI natives are used solely to remove the cost of >>>>> transitions. We don't get anywhere near java heap in native code. >>>>> >>>>> In general I think it makes a lot of sense for Java as a >>>>> language/platform to have some guards around unsafe code, but on the other >>>>> hand the popularity of libraries employing Unsafe and their success in more >>>>> performance-oriented corners of software engineering is a clear indicator >>>>> there is a need for the JVM to provide access to more low-level primitives >>>>> and mechanisms. >>>>> I think it's entirely fair to tell developers that all bets are off >>>>> when they get into some non-idiomatic scenarios but please don't take away >>>>> a feature that greatly contributed to Java's success. >>>>> >>>>> Kind regards, >>>>> Wojtek >>>>> >>>>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < >>>>> maurizio.cimadamore at oracle.com> wrote: >>>>> >>>>>> Hi Wojciech, >>>>>> picking up this thread again. After some internal discussion, we >>>>>> realize that we don't know enough about your use case. While re-enabling >>>>>> JNI critical would obviously provide a quick fix, we're afraid that (a) >>>>>> developers might end up depending on JNI critical when they don't need to >>>>>> (perhaps also unaware of the consequences of depending on it) and (b) that >>>>>> there might actually be _better_ (as in: much faster) solutions than using >>>>>> critical native calls to address at least some of your use cases (that >>>>>> seemed to be the case with the clock_gettime example you mentioned). Could >>>>>> you please provide a rough list of the native calls you make where you >>>>>> believe critical JNI is having a real impact in the performance of your >>>>>> application? Also, could you please tell us whether any of these calls need >>>>>> to interact with Java arrays? In other words, do you use critical JNI to >>>>>> remove the cost associated with thread transitions, or are you also taking >>>>>> advantage of accessing on-heap memory _directly_ from native code? >>>>>> >>>>>> Regards >>>>>> Maurizio >>>>>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>>>> >>>>>> Hi Mark, >>>>>> >>>>>> Thanks for your input and apologies for the delayed response. >>>>>> >>>>>> > If the platform included, say, an intrinsified System.nanoRealTime() >>>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>>> that help developers in your unnamed industry? >>>>>> >>>>>> Exposing realtime clock with nanosecond granularity in the JDK would >>>>>> be a great step forward. I should have made it clear that I represent >>>>>> fintech corner (investment banking to be exact) but the issues my message >>>>>> touches upon span areas such as HPC, audio processing, gaming, and defense >>>>>> industry so it's not like we have an isolated case. >>>>>> >>>>>> > In a similar vein, if people are finding it necessary to ?replace >>>>>> parts >>>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>>> understand what their requirements are >>>>>> >>>>>> As for the other example I provided with making very short lived >>>>>> syscalls such as recvmsg/recvmmsg the premise is getting access to hardware >>>>>> timestamps on the ingress and egress ends as well as enabling batch receive >>>>>> with a single syscall and otherwise exploiting features unavailable from >>>>>> the JDK (like access to CMSG interface, scatter/gather, etc). >>>>>> There are also other examples of calls that we'd love to make often >>>>>> and at lowest possible cost (ie. getrusage) but I'm not sure if there's a >>>>>> strong case for some of these ideas, that's why it might be worth looking >>>>>> into more generic approach for performance sensitive code. >>>>>> Hope this does better job at explaining where we're coming from than >>>>>> my previous messages. >>>>>> >>>>>> Thanks, >>>>>> W >>>>>> >>>>>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>>>> >>>>>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>>>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>>>>> >> CLOCK_REALTIME. >>>>>>> > >>>>>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>>>>> > granularity which is the reason why our industry has to resort to >>>>>>> > clock_gettime. >>>>>>> >>>>>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>>>> that help developers in your unnamed industry? >>>>>>> >>>>>>> In a similar vein, if people are finding it necessary to ?replace >>>>>>> parts >>>>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>>>> understand what their requirements are. Some simple enhancements to >>>>>>> the NIO API would be much less costly to design and implement than a >>>>>>> generalized user-level native-call intrinsification mechanism. >>>>>>> >>>>>>> - Mark >>>>>>> >>>>>> -- >>> Sent from my phone >>> >> -- > Sent from my phone > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Jul 4 17:39:52 2022 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 4 Jul 2022 13:39:52 -0400 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: On Mon, Jul 4, 2022 at 1:38 PM Vitaly Davidovich wrote: > To not sidetrack this thread with my previous reply: > > Maurizio - are you saying java criticals are *already* hindering ZGC > and/or other planned Hotspot improvements? Or that theoretically they could > and you?d like to remove/deprecate them now(ish)? > > If it?s the former, > Argh, sorry - meant to say if it?s the latter. > perhaps it?s prudent to keep them around until a compelling case surfaces > where they preclude or severely restrict evolution of the platform? If it?s > the former, would be curious what that is but would also understand the > rationale behind wanting to remove it. > > On Mon, Jul 4, 2022 at 1:26 PM Vitaly Davidovich > wrote: > >> >> >> On Mon, Jul 4, 2022 at 1:13 PM Wojciech Kudla >> wrote: >> >>> Thanks for your input, Vitaly. I'd be interested to find out more about >>> the nature of the HW noise you observed in your benchmarks as our results >>> were very consistent and it was pretty straightforward to pinpoint the >>> culprit as JNI call overhead. Maybe it was just easier for us because we >>> disallow C- and P-state transitions and put a lot of effort to eliminate >>> platform jitter in general. Were you maybe running on a CPU model that >>> doesn't support constant TSC? I would also suggest retrying with LAPIC >>> interrupts suppressed (with: cli/sti) to maybe see if it's the kernel and >>> not the hardware. >>> >> This was on a Broadwell Xeon chipset with constant tsc. All the typical >> jitter sources were reduced: C/P states disabled in bios, max turbo >> enabled, IRQs steered away, core isolated, etc. By the way, by noise I >> don?t mean the results themselves were noisy - they were constant run to >> run. I just meant the delta between normal vs critical JNI entrypoints was >> very minimal - ie ?in the noise?, particularly with rdtsc. >> >> I can try to remeasure on newer Intel but see below ? >> >>> >>> >>> 100% agree on rdtsc(p) and snippets. There are some narrow usecases were >>> one can get some substantial speed ups with direct access to prefetch or by >>> abusing misprediction to keep icache hot. These scenarios are sadly only >>> available with inline assembly. I know of a few shops that go to the length >>> of forking Graal, etc to achieve that but am quite convinced such >>> capabilities would be welcome and utilized by many more groups if they were >>> easily accessible from java. >>> >> I?m of the firm (and perhaps controversial for some :)) opinion these >> days that Java is simply the wrong platform/tool for low latency cases that >> warrant this level of control. There?re very strong headwinds even outside >> of JNI costs. And the ?real? problem with JNI, besides transition costs, >> is lack of inlining into the native calls. So even if JVM transition costs >> are fully eliminated, there?s still an optimization fence due to lost >> inlining (not unlike native code calling native fns via shared libs). >> >> That?s not say that perf regressions are welcomed - nobody likes those :). >> >>> >>> >>> Thanks, >>> W. >>> >>> On Mon, Jul 4, 2022 at 5:51 PM Vitaly Davidovich >>> wrote: >>> >>>> I?d add rdtsc(p) wrapper functions to the list. These are usually >>>> either inline asm or compiler intrinsic in the JNI entrypoint. In >>>> addition, any native libs exposed via JNI that have ?trivial? functions are >>>> also candidates for faster calling conventions. There?re sometimes way to >>>> mitigate the call overhead (eg batching) but it?s not always feasible. >>>> >>>> I?ll add that last time I tried to measure the improvement of Java >>>> criticals for clock_gettime (and rdtsc) it looked to be in the noise on the >>>> hardware I was testing on. It got the point where I had to instrument the >>>> critical and normal JNI entrypoints to confirm the critical was being hit. >>>> The critical calling convention isn?t significantly different *if* basic >>>> primitives (or no args at all) are passed as args. JNIEnv*, IIRC, is >>>> loaded from a register so that?s minor. jclass (for static calls, which is >>>> what?s relevant here) should be a compiled constant. Critical call still >>>> has a GCLocker check. So I?m not actually sure what the significant >>>> difference is for ?lightweight? (ie few primitive or no args, primitive >>>> return types) calls. >>>> >>>> In general, I do think it?d be nice if there was a faster native call >>>> sequence, even if it comes with a caveat emptor and/or special requirements >>>> on the callee (not unlike the requirements for criticals). I think >>>> Vladimir Ivanov was working on ?snippets? that allowed dynamic construction >>>> of a native call, possibly including assembly. Not sure where that >>>> exploration is these days, but that would be a welcome capability. >>>> >>>> My $.02. Happy 4th of July for those celebrating! >>>> >>>> Vitaly >>>> >>>> On Mon, Jul 4, 2022 at 12:04 PM Maurizio Cimadamore < >>>> maurizio.cimadamore at oracle.com> wrote: >>>> >>>>> Hi, >>>>> while I'm not an expert with some of the IO calls you mention (some of >>>>> my colleagues are more knowledgeable in this area, so I'm sure they will >>>>> have more info), my general sense is that, as with getrusage, if there is a >>>>> system call involved, you already pay a hefty price for the user to kernel >>>>> transition. On my machine this seem to cost around 200ns. In these cases, >>>>> using JNI critical to shave off a dozen of nanoseconds (at best!) seems >>>>> just not worth it. >>>>> >>>>> So, of the functions in your list, the ones in which I *believe* >>>>> dropping transitions would have the most effect are (if we exclude getpid, >>>>> for which another approach is possible) clock_gettime and getcpu, I >>>>> believe, as they might use vdso [1], which typically brings the performance >>>>> of these call closer to calls to shared lib functions. >>>>> >>>>> If you have examples e.g. where performance of recvmsg (or related >>>>> calls) varies significantly between base JNI and critical JNI, please send >>>>> them our way; I'm sure some of my colleagues would be intersted to take a >>>>> look. >>>>> >>>>> Popping back a couple of levels, I think it would be helpful to also >>>>> define what's an acceptable regression in this context. Of course, in an >>>>> ideal world, we'd like to see no performance regression at all. But JNI >>>>> critical is an unsupported interface, which might misbehave with modern >>>>> garbage collectors (e.g. ZGC) and that requires quite a bit of internal >>>>> complexity which might, in the medium/long run, hinder the evolution of the >>>>> Java platform (all these things have _some_ cost, even if the cost is not >>>>> directly material to developers). In this vein, I think calls like >>>>> clock_gettime tend to be more problematic: as they complete very quickly, >>>>> you see the cost of transitions a lot more. In other cases, where syscalls >>>>> are involved, the cost associated to transitions are more likely to be "in >>>>> the noise". Of course if we look at absolute numbers, dropping transitions >>>>> would always yield "faster" code; but at the same time, going from 250ns to >>>>> 245ns is very unlikely to result in visible performance difference when >>>>> considering an application as a whole, so I think it's critical here to >>>>> decide _which_ use cases to prioritize. >>>>> >>>>> I think a good outcome of this discussion would be if we could come to >>>>> some shared understanding of which native calls are truly problematic (e.g. >>>>> clock_gettime-like), and then for the JDK to provide better (and more >>>>> maintainable) alternatives for those (which might even be faster than using >>>>> critical JNI). >>>>> >>>>> Thanks >>>>> Maurizio >>>>> >>>>> [1] - https://man7.org/linux/man-pages/man7/vdso.7.html >>>>> On 04/07/2022 12:23, Wojciech Kudla wrote: >>>>> >>>>> Thanks Maurizio, >>>>> >>>>> I raised this case mainly about clock_gettime and recvmsg/sendmsg, I >>>>> think we're focusing on the wrong things here. Feel free to drop the two >>>>> syscalls from the discussion entirely, but the main usecases I have been >>>>> presenting throughout this thread definitely stand. >>>>> >>>>> Thanks >>>>> >>>>> >>>>> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < >>>>> maurizio.cimadamore at oracle.com> wrote: >>>>> >>>>>> Hi Wojtek, >>>>>> thanks for sharing this list, I think this is a good starting point >>>>>> to understand more about your use case. >>>>>> >>>>>> Last week I've been looking at "getrusage" (as you mentioned it in an >>>>>> earlier email), and I was surprised to see that the call took a pointer to >>>>>> a (fairly big) struct which then needed to be initialized with some >>>>>> thread-local state: >>>>>> >>>>>> https://man7.org/linux/man-pages/man2/getrusage.2.html >>>>>> >>>>>> I've looked at the implementation, and it seems to be doing memset on >>>>>> the user-provided struct pointer, plus all the fields assignment. >>>>>> Eyeballing the implementation, this does not seem to me like a "classic" >>>>>> use case where dropping transition would help much. I mean, surely dropping >>>>>> transitions would help shaving some nanoseconds off the call, but it >>>>>> doesn't seem to me that the call would be shortlived enough to make a >>>>>> difference. Do you have some benchmarks on this one? I did some [1] and the >>>>>> call overhead seemed to come up at 260ns/op - w/o transition you might >>>>>> perhaps be able to get to 250ns, but that's in the noise? >>>>>> >>>>>> As for getpid, note that you can do (since Java 9): >>>>>> >>>>>> ProcessHandle.current().pid(); >>>>>> >>>>>> I believe the impl caches the result, so it shouldn't even make the >>>>>> native call. >>>>>> >>>>>> Maurizio >>>>>> >>>>>> [1] - >>>>>> http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >>>>>> On 02/07/2022 07:42, Wojciech Kudla wrote: >>>>>> >>>>>> Hi Maurizio, >>>>>> >>>>>> Thanks for staying on this. >>>>>> >>>>>> > Could you please provide a rough list of the native calls you make >>>>>> where you believe critical JNI is having a real impact in the performance >>>>>> of your application? >>>>>> >>>>>> From the top of my head: >>>>>> clock_gettime >>>>>> recvmsg >>>>>> recvmmsg >>>>>> sendmsg >>>>>> sendmmsg >>>>>> select >>>>>> getpid >>>>>> getcpu >>>>>> getrusage >>>>>> >>>>>> > Also, could you please tell us whether any of these calls need to >>>>>> interact with Java arrays? >>>>>> No arrays or objects of any type involved. Everything happens by the >>>>>> means of passing raw pointers as longs and using other primitive types as >>>>>> function arguments. >>>>>> >>>>>> > In other words, do you use critical JNI to remove the cost >>>>>> associated with thread transitions, or are you also taking advantage of >>>>>> accessing on-heap memory _directly_ from native code? >>>>>> Criticial JNI natives are used solely to remove the cost of >>>>>> transitions. We don't get anywhere near java heap in native code. >>>>>> >>>>>> In general I think it makes a lot of sense for Java as a >>>>>> language/platform to have some guards around unsafe code, but on the other >>>>>> hand the popularity of libraries employing Unsafe and their success in more >>>>>> performance-oriented corners of software engineering is a clear indicator >>>>>> there is a need for the JVM to provide access to more low-level primitives >>>>>> and mechanisms. >>>>>> I think it's entirely fair to tell developers that all bets are off >>>>>> when they get into some non-idiomatic scenarios but please don't take away >>>>>> a feature that greatly contributed to Java's success. >>>>>> >>>>>> Kind regards, >>>>>> Wojtek >>>>>> >>>>>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < >>>>>> maurizio.cimadamore at oracle.com> wrote: >>>>>> >>>>>>> Hi Wojciech, >>>>>>> picking up this thread again. After some internal discussion, we >>>>>>> realize that we don't know enough about your use case. While re-enabling >>>>>>> JNI critical would obviously provide a quick fix, we're afraid that (a) >>>>>>> developers might end up depending on JNI critical when they don't need to >>>>>>> (perhaps also unaware of the consequences of depending on it) and (b) that >>>>>>> there might actually be _better_ (as in: much faster) solutions than using >>>>>>> critical native calls to address at least some of your use cases (that >>>>>>> seemed to be the case with the clock_gettime example you mentioned). Could >>>>>>> you please provide a rough list of the native calls you make where you >>>>>>> believe critical JNI is having a real impact in the performance of your >>>>>>> application? Also, could you please tell us whether any of these calls need >>>>>>> to interact with Java arrays? In other words, do you use critical JNI to >>>>>>> remove the cost associated with thread transitions, or are you also taking >>>>>>> advantage of accessing on-heap memory _directly_ from native code? >>>>>>> >>>>>>> Regards >>>>>>> Maurizio >>>>>>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>>>>> >>>>>>> Hi Mark, >>>>>>> >>>>>>> Thanks for your input and apologies for the delayed response. >>>>>>> >>>>>>> > If the platform included, say, an intrinsified >>>>>>> System.nanoRealTime() >>>>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>>>> that help developers in your unnamed industry? >>>>>>> >>>>>>> Exposing realtime clock with nanosecond granularity in the JDK would >>>>>>> be a great step forward. I should have made it clear that I represent >>>>>>> fintech corner (investment banking to be exact) but the issues my message >>>>>>> touches upon span areas such as HPC, audio processing, gaming, and defense >>>>>>> industry so it's not like we have an isolated case. >>>>>>> >>>>>>> > In a similar vein, if people are finding it necessary to ?replace >>>>>>> parts >>>>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>>>> understand what their requirements are >>>>>>> >>>>>>> As for the other example I provided with making very short lived >>>>>>> syscalls such as recvmsg/recvmmsg the premise is getting access to hardware >>>>>>> timestamps on the ingress and egress ends as well as enabling batch receive >>>>>>> with a single syscall and otherwise exploiting features unavailable from >>>>>>> the JDK (like access to CMSG interface, scatter/gather, etc). >>>>>>> There are also other examples of calls that we'd love to make often >>>>>>> and at lowest possible cost (ie. getrusage) but I'm not sure if there's a >>>>>>> strong case for some of these ideas, that's why it might be worth looking >>>>>>> into more generic approach for performance sensitive code. >>>>>>> Hope this does better job at explaining where we're coming from than >>>>>>> my previous messages. >>>>>>> >>>>>>> Thanks, >>>>>>> W >>>>>>> >>>>>>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>>>>> >>>>>>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>>>>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>>>>>> >> CLOCK_REALTIME. >>>>>>>> > >>>>>>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>>>>>> > granularity which is the reason why our industry has to resort to >>>>>>>> > clock_gettime. >>>>>>>> >>>>>>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>>>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>>>>> that help developers in your unnamed industry? >>>>>>>> >>>>>>>> In a similar vein, if people are finding it necessary to ?replace >>>>>>>> parts >>>>>>>> of NIO with hand-crafted native code? then it would be interesting >>>>>>>> to >>>>>>>> understand what their requirements are. Some simple enhancements to >>>>>>>> the NIO API would be much less costly to design and implement than a >>>>>>>> generalized user-level native-call intrinsification mechanism. >>>>>>>> >>>>>>>> - Mark >>>>>>>> >>>>>>> -- >>>> Sent from my phone >>>> >>> -- >> Sent from my phone >> > -- > Sent from my phone > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Jul 4 20:29:41 2022 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 4 Jul 2022 16:29:41 -0400 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: On Mon, Jul 4, 2022 at 4:13 PM Maurizio Cimadamore < maurizio.cimadamore at oracle.com> wrote: > Thanks for the clarification, this is very helpful. > > I also assume that the case when "there's nothing to read" is common > enough to make a difference? > Kernel bypass networking is poll mode - you poll the NIC for events (rx and/or tx completions) using a user space driver, there?re no interrupts and no syscalls. So yeah, when you poll for reads, you don?t know a priori if there?re frames to process - only know that after doing the poll. A common scenario is polling udp multicast flows. Besides OpenOnload, there?s the lower level efvi stack (OO uses that internally), which some folks use directly from Java (with some shims/light abstractions in native code accessed via JNI). Mellanox has a similar user space driver, and of course there?s also DPDK. > Maurizio > > > On 04/07/2022 13:50, Wojciech Kudla wrote: > > Hi Maurizio, > > You are correct that under normal circumstances sycalls that are not > supported by vDSO are very heavy but when we call recvmsg/sendmsg we don't > even perform a syscall at all. High frequency trading shops employ kernel > bypass for all network flows pretty much by default. The most popular > solution here is OpenOnload used with Xilinix products. For a case when > there's nothing to read from the RX ring a JavaCrtical JNI call to recvmsg > completes in ~11ns vs 23ns for a standard JNI call with full transition. > Sorry, I've been in this for so long I kind of assumed it's implied. > > Thanks, > W. > > On Mon, Jul 4, 2022 at 12:59 PM Maurizio Cimadamore < > maurizio.cimadamore at oracle.com> wrote: > >> Hi, >> while I'm not an expert with some of the IO calls you mention (some of my >> colleagues are more knowledgeable in this area, so I'm sure they will have >> more info), my general sense is that, as with getrusage, if there is a >> system call involved, you already pay a hefty price for the user to kernel >> transition. On my machine this seem to cost around 200ns. In these cases, >> using JNI critical to shave off a dozen of nanoseconds (at best!) seems >> just not worth it. >> >> So, of the functions in your list, the ones in which I *believe* >> dropping transitions would have the most effect are (if we exclude getpid, >> for which another approach is possible) clock_gettime and getcpu, I >> believe, as they might use vdso [1], which typically brings the performance >> of these call closer to calls to shared lib functions. >> >> If you have examples e.g. where performance of recvmsg (or related calls) >> varies significantly between base JNI and critical JNI, please send them >> our way; I'm sure some of my colleagues would be intersted to take a look. >> >> Popping back a couple of levels, I think it would be helpful to also >> define what's an acceptable regression in this context. Of course, in an >> ideal world, we'd like to see no performance regression at all. But JNI >> critical is an unsupported interface, which might misbehave with modern >> garbage collectors (e.g. ZGC) and that requires quite a bit of internal >> complexity which might, in the medium/long run, hinder the evolution of the >> Java platform (all these things have _some_ cost, even if the cost is not >> directly material to developers). In this vein, I think calls like >> clock_gettime tend to be more problematic: as they complete very quickly, >> you see the cost of transitions a lot more. In other cases, where syscalls >> are involved, the cost associated to transitions are more likely to be "in >> the noise". Of course if we look at absolute numbers, dropping transitions >> would always yield "faster" code; but at the same time, going from 250ns to >> 245ns is very unlikely to result in visible performance difference when >> considering an application as a whole, so I think it's critical here to >> decide _which_ use cases to prioritize. >> >> I think a good outcome of this discussion would be if we could come to >> some shared understanding of which native calls are truly problematic (e.g. >> clock_gettime-like), and then for the JDK to provide better (and more >> maintainable) alternatives for those (which might even be faster than using >> critical JNI). >> >> Thanks >> Maurizio >> >> [1] - https://man7.org/linux/man-pages/man7/vdso.7.html >> >> On 04/07/2022 12:23, Wojciech Kudla wrote: >> >> Thanks Maurizio, >> >> I raised this case mainly about clock_gettime and recvmsg/sendmsg, I >> think we're focusing on the wrong things here. Feel free to drop the two >> syscalls from the discussion entirely, but the main usecases I have been >> presenting throughout this thread definitely stand. >> >> Thanks >> >> >> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore < >> maurizio.cimadamore at oracle.com> wrote: >> >>> Hi Wojtek, >>> thanks for sharing this list, I think this is a good starting point to >>> understand more about your use case. >>> >>> Last week I've been looking at "getrusage" (as you mentioned it in an >>> earlier email), and I was surprised to see that the call took a pointer to >>> a (fairly big) struct which then needed to be initialized with some >>> thread-local state: >>> >>> https://man7.org/linux/man-pages/man2/getrusage.2.html >>> >>> >>> I've looked at the implementation, and it seems to be doing memset on >>> the user-provided struct pointer, plus all the fields assignment. >>> Eyeballing the implementation, this does not seem to me like a "classic" >>> use case where dropping transition would help much. I mean, surely dropping >>> transitions would help shaving some nanoseconds off the call, but it >>> doesn't seem to me that the call would be shortlived enough to make a >>> difference. Do you have some benchmarks on this one? I did some [1] and the >>> call overhead seemed to come up at 260ns/op - w/o transition you might >>> perhaps be able to get to 250ns, but that's in the noise? >>> >>> As for getpid, note that you can do (since Java 9): >>> >>> ProcessHandle.current().pid(); >>> >>> I believe the impl caches the result, so it shouldn't even make the >>> native call. >>> >>> Maurizio >>> >>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >>> On 02/07/2022 07:42, Wojciech Kudla wrote: >>> >>> Hi Maurizio, >>> >>> Thanks for staying on this. >>> >>> > Could you please provide a rough list of the native calls you make >>> where you believe critical JNI is having a real impact in the performance >>> of your application? >>> >>> From the top of my head: >>> clock_gettime >>> recvmsg >>> recvmmsg >>> sendmsg >>> sendmmsg >>> select >>> getpid >>> getcpu >>> getrusage >>> >>> > Also, could you please tell us whether any of these calls need to >>> interact with Java arrays? >>> No arrays or objects of any type involved. Everything happens by the >>> means of passing raw pointers as longs and using other primitive types as >>> function arguments. >>> >>> > In other words, do you use critical JNI to remove the cost associated >>> with thread transitions, or are you also taking advantage of accessing >>> on-heap memory _directly_ from native code? >>> Criticial JNI natives are used solely to remove the cost of transitions. >>> We don't get anywhere near java heap in native code. >>> >>> In general I think it makes a lot of sense for Java as a >>> language/platform to have some guards around unsafe code, but on the other >>> hand the popularity of libraries employing Unsafe and their success in more >>> performance-oriented corners of software engineering is a clear indicator >>> there is a need for the JVM to provide access to more low-level primitives >>> and mechanisms. >>> I think it's entirely fair to tell developers that all bets are off when >>> they get into some non-idiomatic scenarios but please don't take away a >>> feature that greatly contributed to Java's success. >>> >>> Kind regards, >>> Wojtek >>> >>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore < >>> maurizio.cimadamore at oracle.com> wrote: >>> >>>> Hi Wojciech, >>>> picking up this thread again. After some internal discussion, we >>>> realize that we don't know enough about your use case. While re-enabling >>>> JNI critical would obviously provide a quick fix, we're afraid that (a) >>>> developers might end up depending on JNI critical when they don't need to >>>> (perhaps also unaware of the consequences of depending on it) and (b) that >>>> there might actually be _better_ (as in: much faster) solutions than using >>>> critical native calls to address at least some of your use cases (that >>>> seemed to be the case with the clock_gettime example you mentioned). Could >>>> you please provide a rough list of the native calls you make where you >>>> believe critical JNI is having a real impact in the performance of your >>>> application? Also, could you please tell us whether any of these calls need >>>> to interact with Java arrays? In other words, do you use critical JNI to >>>> remove the cost associated with thread transitions, or are you also taking >>>> advantage of accessing on-heap memory _directly_ from native code? >>>> >>>> Regards >>>> Maurizio >>>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>> >>>> Hi Mark, >>>> >>>> Thanks for your input and apologies for the delayed response. >>>> >>>> > If the platform included, say, an intrinsified System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> Exposing realtime clock with nanosecond granularity in the JDK would be >>>> a great step forward. I should have made it clear that I represent fintech >>>> corner (investment banking to be exact) but the issues my message touches >>>> upon span areas such as HPC, audio processing, gaming, and defense industry >>>> so it's not like we have an isolated case. >>>> >>>> > In a similar vein, if people are finding it necessary to ?replace >>>> parts >>>> of NIO with hand-crafted native code? then it would be interesting to >>>> understand what their requirements are >>>> >>>> As for the other example I provided with making very short lived >>>> syscalls such as recvmsg/recvmmsg the premise is getting access to hardware >>>> timestamps on the ingress and egress ends as well as enabling batch receive >>>> with a single syscall and otherwise exploiting features unavailable from >>>> the JDK (like access to CMSG interface, scatter/gather, etc). >>>> There are also other examples of calls that we'd love to make often and >>>> at lowest possible cost (ie. getrusage) but I'm not sure if there's a >>>> strong case for some of these ideas, that's why it might be worth looking >>>> into more generic approach for performance sensitive code. >>>> Hope this does better job at explaining where we're coming from than my >>>> previous messages. >>>> >>>> Thanks, >>>> W >>>> >>>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>> >>>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>>> >> CLOCK_REALTIME. >>>>> > >>>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>>> > granularity which is the reason why our industry has to resort to >>>>> > clock_gettime. >>>>> >>>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>> that help developers in your unnamed industry? >>>>> >>>>> In a similar vein, if people are finding it necessary to ?replace parts >>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>> understand what their requirements are. Some simple enhancements to >>>>> the NIO API would be much less costly to design and implement than a >>>>> generalized user-level native-call intrinsification mechanism. >>>>> >>>>> - Mark >>>>> >>>> -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Mon Jul 4 21:02:33 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 4 Jul 2022 22:02:33 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: <8ac595b5-9969-3b37-74bc-270b268be5dd@oracle.com> On 04/07/2022 17:50, Vitaly Davidovich wrote: > In general, I do think it?d be nice if there was a faster native call > sequence, even if it comes with a caveat emptor and/or special > requirements on the callee (not unlike the requirements for > criticals).? I think Vladimir Ivanov was working on ?snippets? that > allowed dynamic construction of a native call, possibly including > assembly.? Not sure where that exploration is these days, but that > would be a welcome capability. For the records, this was already discussed in a related thread: https://mail.openjdk.org/pipermail/panama-dev/2022-June/017056.html Maurizio From erik.osterlund at oracle.com Mon Jul 4 21:47:12 2022 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Mon, 4 Jul 2022 21:47:12 +0000 Subject: Obsoleting JavaCritical In-Reply-To: <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: <21506449-753B-4483-B10C-8C5991999BD8@oracle.com> Hi, Here is a clarification on the ZGC interactions. The initial form of JNI critical native calls was implemented as an internal thing for SPARC crypto libraries, private to the JDK. JNI calls on SPARC involved flushing register windows, which was actually rather slow. This form came with a mechanism for lazily activating the GC locker for primitive arrays that the crypto code needed direct access to. This essentially deferred invoking the GC locker from the Java thread to the safepoint synchronizer. The problematic aspect for generational ZGC was the async GC locker interactions. Its implication is that each GC safepoint might fail, because the GC locker can?t be locked out before the safepoint is synchronized, so you end up instead trying to lock it inside GC safepoints, only to find that you couldn?t. The failed GC safepoints lead to GC opertions instead being started asynchronously from the GC locker. That was easier to deal with for the mainline version of ZGC since there was only one type of GC: full GCs. So we coped. With generational ZGC, the asynchronous operation has to figure out if it should poke the minor (young) and/or major (young + old) GC drivers. That problem is not easy to solve. However with JNI critical natives gone, the entire GC locker for ZGC is just a simple readers writer lock, where critical native functions use the readers lock and the GC operations use the writer lock. The GC safepoints can?t fail. With the new implementation that avoids doing a transition to native at all, the mentioned problem no longer occurs, as the safepoint synchronizer won?t allow safepoints to creep in right in the middle of all this. So it would seem we are okay with that. So I think as long as we don?t go with the previous async GC locker solution, we can remove ZGC interactions from the equation. However, you obviously instead get a trust problem instead with this flavour of cheating the system. Anything that takes a long ish time in a critical native function without a native transition, is going to be a disaster and hang the entire JVM. That is typically something we do not take lightly and is indeed why we have native transitions. So I would be delighted if we didn?t resurrect ways of cheating the system anyway, unless this is absolutely? critical. It took a long time to get rid of the cheats. /Erik On 4 Jul 2022, at 18:07, Maurizio Cimadamore wrote: ? Hi, while I'm not an expert with some of the IO calls you mention (some of my colleagues are more knowledgeable in this area, so I'm sure they will have more info), my general sense is that, as with getrusage, if there is a system call involved, you already pay a hefty price for the user to kernel transition. On my machine this seem to cost around 200ns. In these cases, using JNI critical to shave off a dozen of nanoseconds (at best!) seems just not worth it. So, of the functions in your list, the ones in which I *believe* dropping transitions would have the most effect are (if we exclude getpid, for which another approach is possible) clock_gettime and getcpu, I believe, as they might use vdso [1], which typically brings the performance of these call closer to calls to shared lib functions. If you have examples e.g. where performance of recvmsg (or related calls) varies significantly between base JNI and critical JNI, please send them our way; I'm sure some of my colleagues would be intersted to take a look. Popping back a couple of levels, I think it would be helpful to also define what's an acceptable regression in this context. Of course, in an ideal world, we'd like to see no performance regression at all. But JNI critical is an unsupported interface, which might misbehave with modern garbage collectors (e.g. ZGC) and that requires quite a bit of internal complexity which might, in the medium/long run, hinder the evolution of the Java platform (all these things have _some_ cost, even if the cost is not directly material to developers). In this vein, I think calls like clock_gettime tend to be more problematic: as they complete very quickly, you see the cost of transitions a lot more. In other cases, where syscalls are involved, the cost associated to transitions are more likely to be "in the noise". Of course if we look at absolute numbers, dropping transitions would always yield "faster" code; but at the same time, going from 250ns to 245ns is very unlikely to result in visible performance difference when considering an application as a whole, so I think it's critical here to decide _which_ use cases to prioritize. I think a good outcome of this discussion would be if we could come to some shared understanding of which native calls are truly problematic (e.g. clock_gettime-like), and then for the JDK to provide better (and more maintainable) alternatives for those (which might even be faster than using critical JNI). Thanks Maurizio [1] - https://man7.org/linux/man-pages/man7/vdso.7.html On 04/07/2022 12:23, Wojciech Kudla wrote: Thanks Maurizio, I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think we're focusing on the wrong things here. Feel free to drop the two syscalls from the discussion entirely, but the main usecases I have been presenting throughout this thread definitely stand. Thanks On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore > wrote: Hi Wojtek, thanks for sharing this list, I think this is a good starting point to understand more about your use case. Last week I've been looking at "getrusage" (as you mentioned it in an earlier email), and I was surprised to see that the call took a pointer to a (fairly big) struct which then needed to be initialized with some thread-local state: https://man7.org/linux/man-pages/man2/getrusage.2.html I've looked at the implementation, and it seems to be doing memset on the user-provided struct pointer, plus all the fields assignment. Eyeballing the implementation, this does not seem to me like a "classic" use case where dropping transition would help much. I mean, surely dropping transitions would help shaving some nanoseconds off the call, but it doesn't seem to me that the call would be shortlived enough to make a difference. Do you have some benchmarks on this one? I did some [1] and the call overhead seemed to come up at 260ns/op - w/o transition you might perhaps be able to get to 250ns, but that's in the noise? As for getpid, note that you can do (since Java 9): ProcessHandle.current().pid(); I believe the impl caches the result, so it shouldn't even make the native call. Maurizio [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java On 02/07/2022 07:42, Wojciech Kudla wrote: Hi Maurizio, Thanks for staying on this. > Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? From the top of my head: clock_gettime recvmsg recvmmsg sendmsg sendmmsg select getpid getcpu getrusage > Also, could you please tell us whether any of these calls need to interact with Java arrays? No arrays or objects of any type involved. Everything happens by the means of passing raw pointers as longs and using other primitive types as function arguments. > In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? Criticial JNI natives are used solely to remove the cost of transitions. We don't get anywhere near java heap in native code. In general I think it makes a lot of sense for Java as a language/platform to have some guards around unsafe code, but on the other hand the popularity of libraries employing Unsafe and their success in more performance-oriented corners of software engineering is a clear indicator there is a need for the JVM to provide access to more low-level primitives and mechanisms. I think it's entirely fair to tell developers that all bets are off when they get into some non-idiomatic scenarios but please don't take away a feature that greatly contributed to Java's success. Kind regards, Wojtek On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore > wrote: Hi Wojciech, picking up this thread again. After some internal discussion, we realize that we don't know enough about your use case. While re-enabling JNI critical would obviously provide a quick fix, we're afraid that (a) developers might end up depending on JNI critical when they don't need to (perhaps also unaware of the consequences of depending on it) and (b) that there might actually be _better_ (as in: much faster) solutions than using critical native calls to address at least some of your use cases (that seemed to be the case with the clock_gettime example you mentioned). Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? Also, could you please tell us whether any of these calls need to interact with Java arrays? In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? Regards Maurizio On 13/06/2022 21:38, Wojciech Kudla wrote: Hi Mark, Thanks for your input and apologies for the delayed response. > If the platform included, say, an intrinsified System.nanoRealTime() method that returned clock_gettime(CLOCK_REALTIME), how much would that help developers in your unnamed industry? Exposing realtime clock with nanosecond granularity in the JDK would be a great step forward. I should have made it clear that I represent fintech corner (investment banking to be exact) but the issues my message touches upon span areas such as HPC, audio processing, gaming, and defense industry so it's not like we have an isolated case. > In a similar vein, if people are finding it necessary to ?replace parts of NIO with hand-crafted native code? then it would be interesting to understand what their requirements are As for the other example I provided with making very short lived syscalls such as recvmsg/recvmmsg the premise is getting access to hardware timestamps on the ingress and egress ends as well as enabling batch receive with a single syscall and otherwise exploiting features unavailable from the JDK (like access to CMSG interface, scatter/gather, etc). There are also other examples of calls that we'd love to make often and at lowest possible cost (ie. getrusage) but I'm not sure if there's a strong case for some of these ideas, that's why it might be worth looking into more generic approach for performance sensitive code. Hope this does better job at explaining where we're coming from than my previous messages. Thanks, W On Tue, Jun 7, 2022 at 6:31 PM > wrote: 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >> CLOCK_REALTIME. > > Unfortunately System.currentTimeMillis() offers only millisecond > granularity which is the reason why our industry has to resort to > clock_gettime. If the platform included, say, an intrinsified System.nanoRealTime() method that returned clock_gettime(CLOCK_REALTIME), how much would that help developers in your unnamed industry? In a similar vein, if people are finding it necessary to ?replace parts of NIO with hand-crafted native code? then it would be interesting to understand what their requirements are. Some simple enhancements to the NIO API would be much less costly to design and implement than a generalized user-level native-call intrinsification mechanism. - Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From iklam at openjdk.org Mon Jul 4 23:18:17 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 4 Jul 2022 23:18:17 GMT Subject: RFR: 8289710: Move Suspend/Resume classes out of os.hpp Message-ID: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Please review this simple change that only renames a few classes and moved some code around. No functional changes. The following classes are used only sparingly. They should be moved to a new header file share/runtime/suspend.hpp to minimize the size of os.hpp - SuspendedThreadTaskContext - SuspendedThreadTask - SuspendResume I didn't move the OS-specific implementation to a new file -- the POSIX implementation is currently inside [signals_posix.cpp](https://github.com/openjdk/jdk/blob/df063f7db18a40ea7325fe608b3206a6dff812c1/src/hotspot/os/posix/signals_posix.cpp#L1790) mixed with other signal handling code, so it doesn't seem a good idea to move out just the code for the above 3 classes. The only other implementation is in os_windows.cpp. I could move the code to suspend_windows.cpp, but I don't feel very motivated unless someone insists. ------------- Commit messages: - 8289710: Move Suspend/Resume classes out of os.hpp Changes: https://git.openjdk.org/jdk/pull/9371/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9371&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289710 Stats: 322 lines in 12 files changed: 172 ins; 114 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/9371.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9371/head:pull/9371 PR: https://git.openjdk.org/jdk/pull/9371 From dholmes at openjdk.org Mon Jul 4 23:58:25 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 4 Jul 2022 23:58:25 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v3] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 13:31:43 GMT, Thomas Stuefe wrote: >> [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. >> >> We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. >> >> I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. >> >> Places that allow raw C functions: >> - decoder on Linux, since the C++ demangler returns raw C heap >> - realpath, in conjunction with allowing real free for the returned buffer >> - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point >> - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities >> - obviously os::malloc and friends >> - NMT pre-initialization code because circularities >> - In gtest main function - I think gtest should work always, even if os::malloc is broken. >> >> Places I fixed: >> - ZGC, mountpoint string handling >> - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay >> - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter >> - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this >> - A couple of places in gtests. >> >> Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. >> >> ---- >> >> Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. >> >> GHAs are in work. >> >> >> [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use our realpath() wrapper in os_perf_linux.cpp Hi Thomas, Generally this looks okay. Not sure whether this will impact any of the NMT tests as we will now be using NMT in new places. At least one place where it isn't obvious there is a matching os::free for os::strdup. Thanks. src/hotspot/share/compiler/compilerEvent.cpp line 105: > 103: > 104: index = phase_names->length(); > 105: phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); Where is the `os::free()` to pair with this allocation? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9356 From kbarrett at openjdk.org Tue Jul 5 01:09:28 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 5 Jul 2022 01:09:28 GMT Subject: [jdk19] RFR: 8288759: GCC 12 fails to compile signature.cpp due to -Wstringop-overread [v2] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 12:12:40 GMT, Aleksey Shipilev wrote: >> Trying to compile with GCC 12.1.1 (current Fedora Rawhide) yields this failure: >> >> >> In file included from /home/test/shipilev-jdk/src/hotspot/share/utilities/globalDefinitions_gcc.hpp:35, >> from /home/test/shipilev-jdk/src/hotspot/share/utilities/globalDefinitions.hpp:35, >> from /home/test/shipilev-jdk/src/hotspot/share/memory/allocation.hpp:29, >> from /home/test/shipilev-jdk/src/hotspot/share/classfile/classLoaderData.hpp:28, >> from /home/test/shipilev-jdk/src/hotspot/share/precompiled/precompiled.hpp:34: >> In function 'const void* memchr(const void*, int, size_t)', >> inlined from 'int SignatureStream::scan_type(BasicType)' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:343:32, >> inlined from 'void SignatureStream::next()' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:373:19, >> inlined from 'void SignatureIterator::do_parameters_on(T*) [with T = Fingerprinter]' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.hpp:635:41, >> inlined from 'void SignatureIterator::do_parameters_on(T*) [with T = Fingerprinter]' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.hpp:629:6, >> inlined from 'void Fingerprinter::compute_fingerprint_and_return_type(bool)' at /home/test/shipilev-jdk/src/hotspot/share/runtime/signature.cpp:169:19: > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Better fix the actual warning > - Merge branch 'master' into JDK-8288759-gcc12-string-overread > - Fix The proposed fix seems plausible, and addresses the warning. I was a little worried that someone might come along later and try to change it to just assert(end < limit, "invalid type"); but presumably that will get the warning again during testing. I tried to think of a less contrived way to write this while still addressing the warning. I haven't come up with anything better, assuming returning limit is okay. But I'm entirely unfamiliar with the signature code, so don't know if returning limit is okay. So don't count me as a reviewer for this change (and I won't hit the Approve button). ------------- PR: https://git.openjdk.org/jdk19/pull/49 From kbarrett at openjdk.org Tue Jul 5 02:20:31 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 5 Jul 2022 02:20:31 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v3] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 13:31:43 GMT, Thomas Stuefe wrote: >> [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. >> >> We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. >> >> I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. >> >> Places that allow raw C functions: >> - decoder on Linux, since the C++ demangler returns raw C heap >> - realpath, in conjunction with allowing real free for the returned buffer >> - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point >> - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities >> - obviously os::malloc and friends >> - NMT pre-initialization code because circularities >> - In gtest main function - I think gtest should work always, even if os::malloc is broken. >> >> Places I fixed: >> - ZGC, mountpoint string handling >> - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay >> - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter >> - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this >> - A couple of places in gtests. >> >> Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. >> >> ---- >> >> Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. >> >> GHAs are in work. >> >> >> [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use our realpath() wrapper in os_perf_linux.cpp Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9356 From stuefe at openjdk.org Tue Jul 5 03:46:41 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 5 Jul 2022 03:46:41 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v3] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 23:49:20 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Use our realpath() wrapper in os_perf_linux.cpp > > src/hotspot/share/compiler/compilerEvent.cpp line 105: > >> 103: >> 104: index = phase_names->length(); >> 105: phase_names->append(use_strdup ? os::strdup(phase_name) : phase_name); > > Where is the `os::free()` to pair with this allocation? AFAICS this is a static global table of phase names that only ever grows and is never deleted. ------------- PR: https://git.openjdk.org/jdk/pull/9356 From stuefe at openjdk.org Tue Jul 5 04:29:41 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 5 Jul 2022 04:29:41 GMT Subject: RFR: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings [v3] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 13:31:43 GMT, Thomas Stuefe wrote: >> [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. >> >> We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. >> >> I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. >> >> Places that allow raw C functions: >> - decoder on Linux, since the C++ demangler returns raw C heap >> - realpath, in conjunction with allowing real free for the returned buffer >> - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point >> - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities >> - obviously os::malloc and friends >> - NMT pre-initialization code because circularities >> - In gtest main function - I think gtest should work always, even if os::malloc is broken. >> >> Places I fixed: >> - ZGC, mountpoint string handling >> - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay >> - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter >> - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this >> - A couple of places in gtests. >> >> Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. >> >> ---- >> >> Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. >> >> GHAs are in work. >> >> >> [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use our realpath() wrapper in os_perf_linux.cpp Hi David, > Hi Thomas, > > Generally this looks okay. Not sure whether this will impact any of the NMT tests as we will now be using NMT in new places. NMT tests are fine. The additions are minor. We may now see more leaks, since two of the places I adapted did not free the memory; so if Oracle has some sort of NMT-based leak testing internally, that may now fail. > > At least one place where it isn't obvious there is a matching os::free for os::strdup. > > Thanks. Thanks David! ------------- PR: https://git.openjdk.org/jdk/pull/9356 From stuefe at openjdk.org Tue Jul 5 04:29:42 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 5 Jul 2022 04:29:42 GMT Subject: Integrated: JDK-8289633: Forbid raw C-heap allocation functions in hotspot and fix findings In-Reply-To: References: Message-ID: <0R78iOIzVdb82UNu-BDoWgDy2LWlraV5aX1sY6349hk=.bc11143e-6b64-4839-811a-f6c66b919061@github.com> On Sun, 3 Jul 2022 08:04:09 GMT, Thomas Stuefe wrote: > [JDK-8214976](https://bugs.openjdk.org/browse/JDK-8214976) introduced a way to forbid functions from being called outside of explicitly allowed contexts. Kim [1] proposed to use that functionality to forbid raw malloc and friends. That would have prevented [JDK-8289477](https://bugs.openjdk.org/browse/JDK-8289477), which sneaked in raw malloc and free via C-runtime defined macros. > > We forbid now all functions that return C-heap, even if that is only optional, like with `realpath`. Note that there may be more functions, but these are all I know from the top of my head. We forbid them even if they are exotic to prevent devs from using them in the future and also from creep-in via system macros. > > I found a number of places where raw allocation functions were used, mostly strdup. I either changes those places to use os::xxx where I was confident that works, or where I saw we really must use the raw functions I marked them with ALLOW_C_FUNCTION. > > Places that allow raw C functions: > - decoder on Linux, since the C++ demangler returns raw C heap > - realpath, in conjunction with allowing real free for the returned buffer > - ZGC uses posix_memalign for a static global buffer that never is deleted. Keeping to use posix_memalign is probably ok, but we should add an os::posix_memalign at some point > - UL, LogTagSet, since UL may also be used for logging inside NMT and we don't want circularities > - obviously os::malloc and friends > - NMT pre-initialization code because circularities > - In gtest main function - I think gtest should work always, even if os::malloc is broken. > > Places I fixed: > - ZGC, mountpoint string handling > - In CompilerEvent we hold a global lookup table with phase names. The names in there leak, but this table never gets cleared, so I think that's okay > - gcLogPrecious, string is fed to VMError::report_and_die, so it probably does not matter > - there were several places in JVMCI, one where we ::strdup a string which we give to a new code blob as blob name. These strings actually leak. I opened https://bugs.openjdk.org/browse/JDK-8289632 to track this > - A couple of places in gtests. > > Note, wherever I introduced os::xxx and had to add os.hpp, I commented the include with "//malloc" to earmark those in case we ever want to move os::malloc and friends into its own header. > > ---- > > Tests: I build and ran gtests manually on x64 fastdebug, release, arm fastdebug, aarch64 fastdebug, x86 fastdebug (all Linux). I also tested build on Alpine x64. > > GHAs are in work. > > > [1] https://mail.openjdk.org/pipermail/hotspot-dev/2022-July/061602.html This pull request has now been integrated. Changeset: 688712f7 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/688712f75cd54caa264494adbe4dfeefc079e1dd Stats: 79 lines in 20 files changed: 36 ins; 1 del; 42 mod 8289633: Forbid raw C-heap allocation functions in hotspot and fix findings Reviewed-by: kbarrett, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/9356 From erik.osterlund at oracle.com Tue Jul 5 05:29:05 2022 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 5 Jul 2022 05:29:05 +0000 Subject: Obsoleting JavaCritical In-Reply-To: <21506449-753B-4483-B10C-8C5991999BD8@oracle.com> References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> <21506449-753B-4483-B10C-8C5991999BD8@oracle.com> Message-ID: For completeness, it should at least be considered that an alternative on the table is to make the JNI transitions fast using asymmetric dekker synchronization. If I understood the problem domain, you are running on linux, and not really using the async GC locking associated with exposing object addresses, but rather want the actual native call to be fast. In that context the arming side of handshakes/safepoints could use sys_membarrier where there is currently a StoreLoad fence. That way we could remove the StoreLoad fence on the back edge of the native transition, which is likely what actually costs something (last time I checked). In general, I?m not sure that this is a worthwhile tradeoff as the amortized cost of fencing has to sum up to the cost of the bigger hammer to be worth it. That would be a lot of native calls to pay for itself. But I suppose that alternative should at least be mentioned as it is a perfectly safe way of speeding up all native calls without resorting to cheating. The single thread handshake would be the most painful in this approach as we would use global synchronization to poke a single thread, unless we shot a signal or something instead for that use case. /Erik On 4 Jul 2022, at 23:47, Erik Osterlund wrote: ? Hi, Here is a clarification on the ZGC interactions. The initial form of JNI critical native calls was implemented as an internal thing for SPARC crypto libraries, private to the JDK. JNI calls on SPARC involved flushing register windows, which was actually rather slow. This form came with a mechanism for lazily activating the GC locker for primitive arrays that the crypto code needed direct access to. This essentially deferred invoking the GC locker from the Java thread to the safepoint synchronizer. The problematic aspect for generational ZGC was the async GC locker interactions. Its implication is that each GC safepoint might fail, because the GC locker can?t be locked out before the safepoint is synchronized, so you end up instead trying to lock it inside GC safepoints, only to find that you couldn?t. The failed GC safepoints lead to GC opertions instead being started asynchronously from the GC locker. That was easier to deal with for the mainline version of ZGC since there was only one type of GC: full GCs. So we coped. With generational ZGC, the asynchronous operation has to figure out if it should poke the minor (young) and/or major (young + old) GC drivers. That problem is not easy to solve. However with JNI critical natives gone, the entire GC locker for ZGC is just a simple readers writer lock, where critical native functions use the readers lock and the GC operations use the writer lock. The GC safepoints can?t fail. With the new implementation that avoids doing a transition to native at all, the mentioned problem no longer occurs, as the safepoint synchronizer won?t allow safepoints to creep in right in the middle of all this. So it would seem we are okay with that. So I think as long as we don?t go with the previous async GC locker solution, we can remove ZGC interactions from the equation. However, you obviously instead get a trust problem instead with this flavour of cheating the system. Anything that takes a long ish time in a critical native function without a native transition, is going to be a disaster and hang the entire JVM. That is typically something we do not take lightly and is indeed why we have native transitions. So I would be delighted if we didn?t resurrect ways of cheating the system anyway, unless this is absolutely? critical. It took a long time to get rid of the cheats. /Erik On 4 Jul 2022, at 18:07, Maurizio Cimadamore wrote: ? Hi, while I'm not an expert with some of the IO calls you mention (some of my colleagues are more knowledgeable in this area, so I'm sure they will have more info), my general sense is that, as with getrusage, if there is a system call involved, you already pay a hefty price for the user to kernel transition. On my machine this seem to cost around 200ns. In these cases, using JNI critical to shave off a dozen of nanoseconds (at best!) seems just not worth it. So, of the functions in your list, the ones in which I *believe* dropping transitions would have the most effect are (if we exclude getpid, for which another approach is possible) clock_gettime and getcpu, I believe, as they might use vdso [1], which typically brings the performance of these call closer to calls to shared lib functions. If you have examples e.g. where performance of recvmsg (or related calls) varies significantly between base JNI and critical JNI, please send them our way; I'm sure some of my colleagues would be intersted to take a look. Popping back a couple of levels, I think it would be helpful to also define what's an acceptable regression in this context. Of course, in an ideal world, we'd like to see no performance regression at all. But JNI critical is an unsupported interface, which might misbehave with modern garbage collectors (e.g. ZGC) and that requires quite a bit of internal complexity which might, in the medium/long run, hinder the evolution of the Java platform (all these things have _some_ cost, even if the cost is not directly material to developers). In this vein, I think calls like clock_gettime tend to be more problematic: as they complete very quickly, you see the cost of transitions a lot more. In other cases, where syscalls are involved, the cost associated to transitions are more likely to be "in the noise". Of course if we look at absolute numbers, dropping transitions would always yield "faster" code; but at the same time, going from 250ns to 245ns is very unlikely to result in visible performance difference when considering an application as a whole, so I think it's critical here to decide _which_ use cases to prioritize. I think a good outcome of this discussion would be if we could come to some shared understanding of which native calls are truly problematic (e.g. clock_gettime-like), and then for the JDK to provide better (and more maintainable) alternatives for those (which might even be faster than using critical JNI). Thanks Maurizio [1] - https://man7.org/linux/man-pages/man7/vdso.7.html On 04/07/2022 12:23, Wojciech Kudla wrote: Thanks Maurizio, I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think we're focusing on the wrong things here. Feel free to drop the two syscalls from the discussion entirely, but the main usecases I have been presenting throughout this thread definitely stand. Thanks On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore > wrote: Hi Wojtek, thanks for sharing this list, I think this is a good starting point to understand more about your use case. Last week I've been looking at "getrusage" (as you mentioned it in an earlier email), and I was surprised to see that the call took a pointer to a (fairly big) struct which then needed to be initialized with some thread-local state: https://man7.org/linux/man-pages/man2/getrusage.2.html I've looked at the implementation, and it seems to be doing memset on the user-provided struct pointer, plus all the fields assignment. Eyeballing the implementation, this does not seem to me like a "classic" use case where dropping transition would help much. I mean, surely dropping transitions would help shaving some nanoseconds off the call, but it doesn't seem to me that the call would be shortlived enough to make a difference. Do you have some benchmarks on this one? I did some [1] and the call overhead seemed to come up at 260ns/op - w/o transition you might perhaps be able to get to 250ns, but that's in the noise? As for getpid, note that you can do (since Java 9): ProcessHandle.current().pid(); I believe the impl caches the result, so it shouldn't even make the native call. Maurizio [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java On 02/07/2022 07:42, Wojciech Kudla wrote: Hi Maurizio, Thanks for staying on this. > Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? From the top of my head: clock_gettime recvmsg recvmmsg sendmsg sendmmsg select getpid getcpu getrusage > Also, could you please tell us whether any of these calls need to interact with Java arrays? No arrays or objects of any type involved. Everything happens by the means of passing raw pointers as longs and using other primitive types as function arguments. > In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? Criticial JNI natives are used solely to remove the cost of transitions. We don't get anywhere near java heap in native code. In general I think it makes a lot of sense for Java as a language/platform to have some guards around unsafe code, but on the other hand the popularity of libraries employing Unsafe and their success in more performance-oriented corners of software engineering is a clear indicator there is a need for the JVM to provide access to more low-level primitives and mechanisms. I think it's entirely fair to tell developers that all bets are off when they get into some non-idiomatic scenarios but please don't take away a feature that greatly contributed to Java's success. Kind regards, Wojtek On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore > wrote: Hi Wojciech, picking up this thread again. After some internal discussion, we realize that we don't know enough about your use case. While re-enabling JNI critical would obviously provide a quick fix, we're afraid that (a) developers might end up depending on JNI critical when they don't need to (perhaps also unaware of the consequences of depending on it) and (b) that there might actually be _better_ (as in: much faster) solutions than using critical native calls to address at least some of your use cases (that seemed to be the case with the clock_gettime example you mentioned). Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? Also, could you please tell us whether any of these calls need to interact with Java arrays? In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? Regards Maurizio On 13/06/2022 21:38, Wojciech Kudla wrote: Hi Mark, Thanks for your input and apologies for the delayed response. > If the platform included, say, an intrinsified System.nanoRealTime() method that returned clock_gettime(CLOCK_REALTIME), how much would that help developers in your unnamed industry? Exposing realtime clock with nanosecond granularity in the JDK would be a great step forward. I should have made it clear that I represent fintech corner (investment banking to be exact) but the issues my message touches upon span areas such as HPC, audio processing, gaming, and defense industry so it's not like we have an isolated case. > In a similar vein, if people are finding it necessary to ?replace parts of NIO with hand-crafted native code? then it would be interesting to understand what their requirements are As for the other example I provided with making very short lived syscalls such as recvmsg/recvmmsg the premise is getting access to hardware timestamps on the ingress and egress ends as well as enabling batch receive with a single syscall and otherwise exploiting features unavailable from the JDK (like access to CMSG interface, scatter/gather, etc). There are also other examples of calls that we'd love to make often and at lowest possible cost (ie. getrusage) but I'm not sure if there's a strong case for some of these ideas, that's why it might be worth looking into more generic approach for performance sensitive code. Hope this does better job at explaining where we're coming from than my previous messages. Thanks, W On Tue, Jun 7, 2022 at 6:31 PM > wrote: 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >> CLOCK_REALTIME. > > Unfortunately System.currentTimeMillis() offers only millisecond > granularity which is the reason why our industry has to resort to > clock_gettime. If the platform included, say, an intrinsified System.nanoRealTime() method that returned clock_gettime(CLOCK_REALTIME), how much would that help developers in your unnamed industry? In a similar vein, if people are finding it necessary to ?replace parts of NIO with hand-crafted native code? then it would be interesting to understand what their requirements are. Some simple enhancements to the NIO API would be much less costly to design and implement than a generalized user-level native-call intrinsification mechanism. - Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From ayang at openjdk.org Tue Jul 5 07:34:30 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 5 Jul 2022 07:34:30 GMT Subject: RFR: 8289520: G1: Remove duplicate checks in G1BarrierSetC1::post_barrier In-Reply-To: <3MnSORD3lOo4Et75TBnD9Y6RmysHMi07QUBKT3mqcsY=.554293bc-0a11-483c-b9c6-b6bc25313223@github.com> References: <3MnSORD3lOo4Et75TBnD9Y6RmysHMi07QUBKT3mqcsY=.554293bc-0a11-483c-b9c6-b6bc25313223@github.com> Message-ID: On Thu, 30 Jun 2022 12:08:59 GMT, Albert Mingkun Yang wrote: > Simple change of removing effectively dead code. > > Test: tier1-3 Thanks for the review. ------------- PR: https://git.openjdk.org/jdk/pull/9333 From ayang at openjdk.org Tue Jul 5 07:34:30 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 5 Jul 2022 07:34:30 GMT Subject: Integrated: 8289520: G1: Remove duplicate checks in G1BarrierSetC1::post_barrier In-Reply-To: <3MnSORD3lOo4Et75TBnD9Y6RmysHMi07QUBKT3mqcsY=.554293bc-0a11-483c-b9c6-b6bc25313223@github.com> References: <3MnSORD3lOo4Et75TBnD9Y6RmysHMi07QUBKT3mqcsY=.554293bc-0a11-483c-b9c6-b6bc25313223@github.com> Message-ID: On Thu, 30 Jun 2022 12:08:59 GMT, Albert Mingkun Yang wrote: > Simple change of removing effectively dead code. > > Test: tier1-3 This pull request has now been integrated. Changeset: 4c997ba8 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/4c997ba8303cc1116c73f6699888a77073a125a2 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod 8289520: G1: Remove duplicate checks in G1BarrierSetC1::post_barrier Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/9333 From mbaesken at openjdk.org Tue Jul 5 08:09:25 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 5 Jul 2022 08:09:25 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Hi Markus, I think it would be possible and probably make sense to instead enhance the existing EventSweepCodeCache. Would it be okay to move down the EventSweepCodeCache to the end of NMethodSweeper::sweep_code_cache() ? I think post_sweep_event would get 2 more parameters, a boolean for JIT-restart and an integer freed_memory. Are you fine with this ? Are there some compatibility concerns when enhancing an existing event EventSweepCodeCache ? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From rpressler at openjdk.org Tue Jul 5 08:31:31 2022 From: rpressler at openjdk.org (Ron Pressler) Date: Tue, 5 Jul 2022 08:31:31 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: > Please review the following bug fix: > > `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. > > Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. > > This change does three things: > > 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. > 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. > 3. In interp_only_mode, the c2i stub will not patch the callsite. > > This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 > > > Passes tiers 1-4 and Loom tiers 1-5. Ron Pressler has updated the pull request incrementally with two additional commits since the last revision: - Add an "i2i" entry to enterSpecial - Fix comment ------------- Changes: - all: https://git.openjdk.org/jdk19/pull/66/files - new: https://git.openjdk.org/jdk19/pull/66/files/4680aed2..7323f635 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk19&pr=66&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk19&pr=66&range=01-02 Stats: 220 lines in 13 files changed: 166 ins; 33 del; 21 mod Patch: https://git.openjdk.org/jdk19/pull/66.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/66/head:pull/66 PR: https://git.openjdk.org/jdk19/pull/66 From rpressler at openjdk.org Tue Jul 5 08:31:32 2022 From: rpressler at openjdk.org (Ron Pressler) Date: Tue, 5 Jul 2022 08:31:32 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v2] In-Reply-To: References: Message-ID: On Sat, 25 Jun 2022 01:23:47 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Remove outdated comment" > > This reverts commit 8f571d76e34bc64ceb31894184fba4b909e8fbfe. Trying a new approach of having another entry into `enterSpecial`, used only when in interp-only-mode, and where the call to `Continuation.enter` always resolves to its interpreted version. This requires more platform-specific code, and also makes the frame appear not `frame::safe_for_sender` when at that callsite, but losing an async poll when in interp_only_mode doesn't seem to be a big issue, and the problem can be easily fixed as JFR is too eager to call `frame::safe_for_sender`. Passes tiers 1-4 as well as Loom tiers 1-5. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From mgronlun at openjdk.org Tue Jul 5 09:59:28 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 5 Jul 2022 09:59:28 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: <7Z9QgLGygK78tr2YNUhtker3sPVReBSM4lalz-vuvOY=.5e1649a1-8141-4097-8faa-4ea1f06601f6@github.com> On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Thanks for considering. It is fine to move the EventSweepCodeCache event around, you can see that the constructor takes the UNTIMED value. This means that timestamping is handled explicitly, by setting the set_starttime() and set_endtime() fields For compatibility, we can think of it like we are creating a subclass with extended fields. Just append the additional fields to the event. As for the memory field declarations, you can take a peek at other events in metadata.xml, for example, this field declaration is from CodeCacheFull: ------------- PR: https://git.openjdk.org/jdk/pull/9334 From maurizio.cimadamore at oracle.com Tue Jul 5 11:33:02 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Tue, 5 Jul 2022 12:33:02 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: Hi, As Erik explained in his reply, what we call "critical JNI" comes in two pieces: one removes Java to native thread transitions (which is what Wojciech is referring to), while another part interacts with the GC locker (basically to allow critical JNI code to access Java arrays w/o copying). I think the latter part is the most problematic GC-wise. Then, regarding the former, I think there are still questions as to whether dropping transitions is the best way to get the performance boost required; for instance, yesterday I did some experiments with an experimental patch from Jorn (kudos) which re-enables an opt-in for "trivial" native calls in the Panama API. I used it to test clock_gettime, and, while there's an improvement, the results I got were not as conclusive as one might expect expected. This is what I get w/ state transitions: ``` Benchmark???????????????????????????????? Mode? Cnt?? Score Error? Units ClockgettimeTest.panama_monotonic???????? avgt?? 30? 27.814 ? 0.165? ns/op ClockgettimeTest.panama_monotonic_coarse? avgt?? 30? 12.094 ? 0.103? ns/op ClockgettimeTest.panama_monotonic_raw???? avgt?? 30? 27.719 ? 0.393? ns/op ClockgettimeTest.panama_realtime????????? avgt?? 30? 27.133 ? 0.280? ns/op ClockgettimeTest.panama_realtime_coarse?? avgt?? 30? 26.812 ? 0.384? ns/op ``` And this is what I get with transitions removed: ``` Benchmark???????????????????????????????? Mode? Cnt?? Score Error? Units ClockgettimeTest.panama_monotonic???????? avgt?? 30? 22.383 ? 0.213? ns/op ClockgettimeTest.panama_monotonic_coarse? avgt?? 30?? 6.312 ? 0.117? ns/op ClockgettimeTest.panama_monotonic_raw???? avgt?? 30? 22.731 ? 0.279? ns/op ClockgettimeTest.panama_realtime????????? avgt?? 30? 22.503 ? 0.292? ns/op ClockgettimeTest.panama_realtime_coarse?? avgt?? 30? 21.853 ? 0.100? ns/op ``` Here we can see a gain of 4-5ns, obtained by dropping the transition. The only case where this makes a significant difference is with the monotonic_coarse flavor. In the other cases there's a difference, yes, but not as pronounced, simply because the term we're comparing against is bigger: it's easy to see a 5ns gain if your function runs for 10ns in total - but such a gain starts to get lost in the "noise" when functions run for longer. And that's the main issue with removing Java->native transitions: the "window" in which this optimization yield a positive effect is extremely narrow (anything lasting longer than 30ns won't probably appreciate much difference), but, as you can see from the PR in [1], the VM changes required to support it touch quite a bit of stuff! Luckily, selectively disabling transitions from Panama is slightly more straightforward and, perhaps, for stuff like recvmsg syscalls that are bypassed, there's not much else we can do: while one could imagine Panama special-casing calls to clock_gettime, as that's a known "leaf", the same cannot be done with rcvmsg, which is in general a blocking call. Panama also has a "trusted mode" flag (--enable-native-access), so there is a way in the Panama API to distinguish between safe and unsafe API point, which also helps with this. The risk of course is for developers to see whatever mechanism is provided as some kind of "make my code go fast please" and apply it blindly, w/o fully understanding the consequences. What I said before about "extremely narrow window" remains true: in the vast majority of cases (like 99%) dropping state transitions can result in very big downsides, while the corresponding upsides are not big enough to even be noticeable (the Q/A in [2] arrives at a very similar conclusion). All this said, selectively disabling state transitions from native calls made using the Panama foreign API seem the most straightforward way to offset the performance delta introduced by the removal of critical JNI. In part it's because the Panama API is more flexible, e.g. function descriptors allows us to model the distinction between a trivial and non-trivial call; in part it's because, as stated above, Panama can already reason about calls that are "unsafe" and that require extra permissions. And, finally it's also because, if we added back critical JNI, we'd probably add it back w/o its most problematic GC locker parts (that's what [1] does AFAIK) - which means it won't be a complete code reversal. So, perhaps, coming up with a fresh mechanism to drop transitions (only) could also be less confusing for developers. Of course this would require developers such as Wojciech to rewrite some of the code to use Panama instead of JNI. And, coming back to clock_gettime, my feeling is that with the right tools (e.g. some intrinsics), we can make that go a lot faster than what shown above. Being able to quickly get a timestamp seems a widely-enough applicable use case to deserves some special treatment. So, perhaps, it's worth considering a _spectrum of solutions_ on how to improve the status quo, rather than investing solely on the removal of thread transitions. Maurizio [1] - https://github.com/openjdk/jdk19/pull/90/files [2] - https://youtu.be/LoyBTqkSkZk?t=742 On 04/07/2022 18:38, Vitaly Davidovich wrote: > To not sidetrack this thread with my previous reply: > > Maurizio - are you saying java criticals are *already* hindering ZGC > and/or other planned Hotspot improvements? Or that theoretically they > could and you?d like to remove/deprecate them now(ish)? > > If it?s the former, perhaps it?s prudent to keep them around until a > compelling case surfaces where they preclude or severely restrict > evolution of the platform? If it?s the former, would be curious what > that is but would also understand the rationale behind wanting to > remove it. > > On Mon, Jul 4, 2022 at 1:26 PM Vitaly Davidovich > wrote: > > > > On Mon, Jul 4, 2022 at 1:13 PM Wojciech Kudla > wrote: > > Thanks for your input, Vitaly. I'd be interested to find out > more about the nature of the HW noise you observed in your > benchmarks as our results were very consistent and it was > pretty straightforward to pinpoint the culprit as JNI call > overhead. Maybe it was just easier for us because we disallow > C- and P-state transitions and put a lot of effort to > eliminate platform jitter in general. Were you maybe running > on a CPU model that doesn't support constant TSC? I would also > suggest retrying with LAPIC interrupts suppressed (with: > cli/sti) to maybe see if it's the kernel and not the hardware. > > This was on a Broadwell Xeon chipset with constant tsc.? All the > typical jitter sources were reduced: C/P states disabled in bios, > max turbo enabled, IRQs steered away, core isolated, etc.? By the > way, by noise I don?t mean the results themselves were noisy - > they were constant run to run.? I just meant the delta between > normal vs critical JNI entrypoints was very minimal - ie ?in the > noise?, particularly with rdtsc. > > I can try to remeasure on newer Intel but see below ? > > > > 100% agree on rdtsc(p) and snippets. There are some narrow > usecases were one can get some substantial speed ups with > direct access to prefetch or by abusing misprediction to keep > icache hot. These scenarios are sadly only available with > inline assembly. I know of a few shops that go to the length > of forking Graal, etc to achieve that but am quite convinced > such capabilities would be welcome and utilized by many more > groups if they were easily accessible from java. > > I?m of the firm (and perhaps controversial for some :)) opinion > these days that Java is simply the wrong platform/tool for low > latency cases that warrant this level of control.? There?re very > strong headwinds even outside of JNI costs.? And the ?real? > problem with JNI, besides transition costs, is lack of inlining > into the native calls.? So even if JVM transition costs are fully > eliminated, there?s still an optimization fence due to lost > inlining (not unlike native code calling native fns via shared libs). > > That?s not say that perf regressions are welcomed - nobody likes > those :). > > > > Thanks, > W. > > On Mon, Jul 4, 2022 at 5:51 PM Vitaly Davidovich > wrote: > > I?d add rdtsc(p) wrapper functions to the list.? These are > usually either inline asm or compiler intrinsic in the JNI > entrypoint.? In addition, any native libs exposed via JNI > that have ?trivial? functions are also candidates for > faster calling conventions.? There?re sometimes way to > mitigate the call overhead (eg batching) but it?s not > always feasible. > > I?ll add that last time I tried to measure the improvement > of Java criticals for clock_gettime (and rdtsc) it looked > to be in the noise on the hardware I was testing on.? It > got the point where I had to instrument the critical and > normal JNI entrypoints to confirm the critical was being > hit.? The critical calling convention isn?t significantly > different *if* basic primitives (or no args at all) are > passed as args.? JNIEnv*, IIRC, is loaded from a register > so that?s minor. ?jclass (for static calls, which is > what?s relevant here) should be a compiled constant.? > Critical call still has a GCLocker check.? So I?m not > actually sure what the significant difference is for > ?lightweight? (ie few primitive or no args, primitive > return types) calls. > > In general, I do think it?d be nice if there was a faster > native call sequence, even if it comes with a caveat > emptor and/or special requirements on the callee (not > unlike the requirements for criticals).? I think Vladimir > Ivanov was working on ?snippets? that allowed dynamic > construction of a native call, possibly including > assembly.? Not sure where that exploration is these days, > but that would be a welcome capability. > > My $.02.? Happy 4th of July for those celebrating! > > Vitaly > > On Mon, Jul 4, 2022 at 12:04 PM Maurizio Cimadamore > wrote: > > Hi, > while I'm not an expert with some of the IO calls you > mention (some of my colleagues are more knowledgeable > in this area, so I'm sure they will have more info), > my general sense is that, as with getrusage, if there > is a system call involved, you already pay a hefty > price for the user to kernel transition. On my machine > this seem to cost around 200ns. In these cases, using > JNI critical to shave off a dozen of nanoseconds (at > best!) seems just not worth it. > > So, of the functions in your list, the ones in which I > *believe*? dropping transitions would have the most > effect are (if we exclude getpid, for which another > approach is possible) clock_gettime and getcpu, I > believe, as they might use vdso [1], which typically > brings the performance of these call closer to calls > to shared lib functions. > > If you have examples e.g. where performance of recvmsg > (or related calls) varies significantly between base > JNI and critical JNI, please send them our way; I'm > sure some of my colleagues would be intersted to take > a look. > > Popping back a couple of levels, I think it would be > helpful to also define what's an acceptable regression > in this context. Of course, in an ideal world, we'd > like to see no performance regression at all. But JNI > critical is an unsupported interface, which might > misbehave with modern garbage collectors (e.g. ZGC) > and that requires quite a bit of internal complexity > which might, in the medium/long run, hinder the > evolution of the Java platform (all these things have > _some_ cost, even if the cost is not directly material > to developers). In this vein, I think calls like > clock_gettime tend to be more problematic: as they > complete very quickly, you see the cost of transitions > a lot more. In other cases, where syscalls are > involved, the cost associated to transitions are more > likely to be "in the noise". Of course if we look at > absolute numbers, dropping transitions would always > yield "faster" code; but at the same time, going from > 250ns to 245ns is very unlikely to result in visible > performance difference when considering an application > as a whole, so I think it's critical here to decide > _which_ use cases to prioritize. > > I think a good outcome of this discussion would be if > we could come to some shared understanding of which > native calls are truly problematic (e.g. > clock_gettime-like), and then for the JDK to provide > better (and more maintainable) alternatives for those > (which might even be faster than using critical JNI). > > Thanks > Maurizio > > [1] - https://man7.org/linux/man-pages/man7/vdso.7.html > > On 04/07/2022 12:23, Wojciech Kudla wrote: >> Thanks Maurizio, >> >> I raised this case mainly about clock_gettime and >> recvmsg/sendmsg, I think we're focusing on the wrong >> things here. Feel free to drop the two syscalls from >> the discussion entirely, but the main usecases I have >> been presenting throughout this thread definitely stand. >> >> Thanks >> >> >> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore >> wrote: >> >> Hi Wojtek, >> thanks for sharing this list, I think this is a >> good starting point to understand more about your >> use case. >> >> Last week I've been looking at "getrusage" (as >> you mentioned it in an earlier email), and I was >> surprised to see that the call took a pointer to >> a (fairly big) struct which then needed to be >> initialized with some thread-local state: >> >> https://man7.org/linux/man-pages/man2/getrusage.2.html >> >> I've looked at the implementation, and it seems >> to be doing memset on the user-provided struct >> pointer, plus all the fields assignment. >> Eyeballing the implementation, this does not seem >> to me like a "classic" use case where dropping >> transition would help much. I mean, surely >> dropping transitions would help shaving some >> nanoseconds off the call, but it doesn't seem to >> me that the call would be shortlived enough to >> make a difference. Do you have some benchmarks on >> this one? I did some [1] and the call overhead >> seemed to come up at 260ns/op - w/o transition >> you might perhaps be able to get to 250ns, but >> that's in the noise? >> >> As for getpid, note that you can do (since Java 9): >> >> ProcessHandle.current().pid(); >> >> I believe the impl caches the result, so it >> shouldn't even make the native call. >> >> Maurizio >> >> [1] - >> http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >> >> On 02/07/2022 07:42, Wojciech Kudla wrote: >>> Hi Maurizio, >>> >>> Thanks for staying on this. >>> >>> > Could you please provide a rough list of the >>> native calls you make where you believe critical >>> JNI is having a real impact in the performance >>> of your application? >>> >>> From the top of my head: >>> clock_gettime >>> recvmsg >>> recvmmsg >>> sendmsg >>> sendmmsg >>> select >>> getpid >>> getcpu >>> getrusage >>> >>> > Also, could you please tell us whether any of >>> these calls need to interact with Java arrays? >>> No arrays or objects of any type involved. >>> Everything happens by the means of passing raw >>> pointers as longs and using other primitive >>> types as function arguments. >>> >>> > In other words, do you use critical JNI to >>> remove the cost associated with thread >>> transitions, or are you also taking advantage of >>> accessing on-heap memory _directly_ from native >>> code? >>> Criticial JNI natives are used solely to remove >>> the cost of transitions. We don't get anywhere >>> near java heap in native code. >>> >>> In general I think it makes a lot of sense for >>> Java as a language/platform to have some guards >>> around unsafe code, but on the other hand the >>> popularity of libraries employing Unsafe and >>> their success in more performance-oriented >>> corners of software engineering is a clear >>> indicator there is a need for the JVM to provide >>> access to more low-level primitives and mechanisms. >>> I think it's entirely fair to tell developers >>> that all bets are off when they get into some >>> non-idiomatic scenarios but please don't take >>> away a feature that greatly contributed to >>> Java's success. >>> >>> Kind regards, >>> Wojtek >>> >>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio >>> Cimadamore wrote: >>> >>> Hi Wojciech, >>> picking up this thread again. After some >>> internal discussion, we realize that we >>> don't know enough about your use case. While >>> re-enabling JNI critical would obviously >>> provide a quick fix, we're afraid that (a) >>> developers might end up depending on JNI >>> critical when they don't need to (perhaps >>> also unaware of the consequences of >>> depending on it) and (b) that there might >>> actually be _better_ (as in: much faster) >>> solutions than using critical native calls >>> to address at least some of your use cases >>> (that seemed to be the case with the >>> clock_gettime example you mentioned). Could >>> you please provide a rough list of the >>> native calls you make where you believe >>> critical JNI is having a real impact in the >>> performance of your application? Also, could >>> you please tell us whether any of these >>> calls need to interact with Java arrays? In >>> other words, do you use critical JNI to >>> remove the cost associated with thread >>> transitions, or are you also taking >>> advantage of accessing on-heap memory >>> _directly_ from native code? >>> >>> Regards >>> Maurizio >>> >>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>> Hi Mark, >>>> >>>> Thanks for your input and apologies for the >>>> delayed response. >>>> >>>> > If the platform included, say, an >>>> intrinsified System.nanoRealTime() >>>> method that returned >>>> clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> Exposing realtime clock with nanosecond >>>> granularity in the JDK would be a great >>>> step forward. I should have made it clear >>>> that I represent fintech corner (investment >>>> banking to be exact) but the issues my >>>> message touches upon span areas such as >>>> HPC, audio processing, gaming, and defense >>>> industry so it's not like we have an >>>> isolated case. >>>> >>>> > In a similar vein, if people are finding >>>> it necessary to ?replace parts >>>> of NIO with hand-crafted native code? then >>>> it would be interesting to >>>> understand what their requirements are >>>> >>>> As for the other example I provided with >>>> making very short lived syscalls such as >>>> recvmsg/recvmmsg the premise is getting >>>> access to hardware timestamps on the >>>> ingress and egress ends as well as enabling >>>> batch receive with a single syscall and >>>> otherwise exploiting features unavailable >>>> from the JDK (like access to CMSG >>>> interface, scatter/gather, etc). >>>> There are also other examples of calls that >>>> we'd love to make often and at lowest >>>> possible cost (ie. getrusage) but I'm not >>>> sure if there's a strong case for some of >>>> these ideas, that's why it might be worth >>>> looking into more generic approach for >>>> performance sensitive code. >>>> Hope this does better job at explaining >>>> where we're coming from than my previous >>>> messages. >>>> >>>> Thanks, >>>> W >>>> >>>> On Tue, Jun 7, 2022 at 6:31 PM >>>> wrote: >>>> >>>> 2022/6/6 0:24:17 -0700, >>>> wkudla.kernel at gmail.com: >>>> >> Yes for System.nanoTime(), but >>>> System.currentTimeMillis() reports >>>> >> CLOCK_REALTIME. >>>> > >>>> > Unfortunately >>>> System.currentTimeMillis() offers only >>>> millisecond >>>> > granularity which is the reason why >>>> our industry has to resort to >>>> > clock_gettime. >>>> >>>> If the platform included, say, an >>>> intrinsified System.nanoRealTime() >>>> method that returned >>>> clock_gettime(CLOCK_REALTIME), how much >>>> would >>>> that help developers in your unnamed >>>> industry? >>>> >>>> In a similar vein, if people are >>>> finding it necessary to ?replace parts >>>> of NIO with hand-crafted native code? >>>> then it would be interesting to >>>> understand what their requirements >>>> are.? Some simple enhancements to >>>> the NIO API would be much less costly >>>> to design and implement than a >>>> generalized user-level native-call >>>> intrinsification mechanism. >>>> >>>> - Mark >>>> > -- > Sent from my phone > > -- > Sent from my phone > > -- > Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Tue Jul 5 11:46:15 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 5 Jul 2022 11:46:15 GMT Subject: RFR: 8289739: Add G1 specific GC breakpoints for testing Message-ID: Hi all, can I have reviews for this change that adds a few G1 specific GC breakpoints for future testing [JDK-8289740](https://bugs.openjdk.org/browse/JDK-8289740). Testing: gha, local testing Thanks, Thomas ------------- Commit messages: - Initial change, adding new concurrent cycle breakpoints Changes: https://git.openjdk.org/jdk/pull/9376/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9376&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289739 Stats: 23 lines in 3 files changed: 23 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9376.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9376/head:pull/9376 PR: https://git.openjdk.org/jdk/pull/9376 From mbaesken at openjdk.org Tue Jul 5 12:22:40 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 5 Jul 2022 12:22:40 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Hi Markus, another question that came up while looking into this - the current SweepCodeCache event has a threshold of 100 ms set in both default.jfc and profile.jfc . This was probably fine for existing usages. But would we loose the JIT restart events sometimes in case we incorporate the Jit restart and frred memory into the current SweepCodeCache event ? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From rehn at openjdk.org Tue Jul 5 12:44:45 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 5 Jul 2022 12:44:45 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 08:31:31 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with two additional commits since the last revision: > > - Add an "i2i" entry to enterSpecial > - Fix comment I think this is much better. I'll give it another round tomorrow. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From dholmes at openjdk.org Tue Jul 5 13:05:38 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 5 Jul 2022 13:05:38 GMT Subject: RFR: 8289710: Move Suspend/Resume classes out of os.hpp In-Reply-To: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> References: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Message-ID: On Mon, 4 Jul 2022 23:07:27 GMT, Ioi Lam wrote: > Please review this simple change that only renames a few classes and moved some code around. No functional changes. > > The following classes are used only sparingly. They should be moved to a new header file share/runtime/suspend.hpp to minimize the size of os.hpp > > - SuspendedThreadTaskContext > - SuspendedThreadTask > - SuspendResume > > I didn't move the OS-specific implementation to a new file -- the POSIX implementation is currently inside [signals_posix.cpp](https://github.com/openjdk/jdk/blob/df063f7db18a40ea7325fe608b3206a6dff812c1/src/hotspot/os/posix/signals_posix.cpp#L1790) mixed with other signal handling code, so it doesn't seem a good idea to move out just the code for the above 3 classes. > > The only other implementation is in os_windows.cpp. I could move the code to suspend_windows.cpp, but I don't feel very motivated unless someone insists. It is hard for me to see what is truly shared, what is posix-only and what is windows-only. Probably the existing placement is not very good when it comes to making those distinctions, but moving things into new header files just seem to highlight the problem. I'm also concerned that this and other recent header files changes are breaking the convention that we generally have a foo.hpp and foo.cpp file. In some cases now if I see a declaration in a header file I have to go and search to find the right cpp file. src/hotspot/share/runtime/suspend.cpp line 34: > 32: } > 33: > 34: #ifndef _WINDOWS I'm curious why this isn't needed on windows? With this ifndef this is really posix-only code and so should be in os/posix/suspend_posix.cpp, or at least os_posix.cpp. ------------- PR: https://git.openjdk.org/jdk/pull/9371 From mgronlun at openjdk.org Tue Jul 5 13:33:40 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 5 Jul 2022 13:33:40 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: <2nJVvpUsAifciqY02N6rN4PCKMy2ur6Ampzx5PV6kAk=.6705ac65-a931-4a04-bc99-a160b9caf3fe@github.com> On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. That is a good reflection. Yes, if the duration of the restartable sweep is below the threshold, then no event will be sent. Can you explain a bit more about what "JIT restart" actually means? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From mbaesken at openjdk.org Tue Jul 5 13:40:39 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 5 Jul 2022 13:40:39 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: <2nJVvpUsAifciqY02N6rN4PCKMy2ur6Ampzx5PV6kAk=.6705ac65-a931-4a04-bc99-a160b9caf3fe@github.com> References: <2nJVvpUsAifciqY02N6rN4PCKMy2ur6Ampzx5PV6kAk=.6705ac65-a931-4a04-bc99-a160b9caf3fe@github.com> Message-ID: On Tue, 5 Jul 2022 13:31:34 GMT, Markus Gr?nlund wrote: > Can you explain a bit more about what "JIT restart" actually means? The comment at line 433 of sweeper.cpp and following is explaining it. The JIT compilation had been stopped before and now, after sweeping freed potentially some memory, JIT compilation is re-enabled. That's why also the log.debug("restart compiler"); in the code. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From mbaesken at openjdk.org Tue Jul 5 13:47:31 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 5 Jul 2022 13:47:31 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: References: Message-ID: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Incorporate JIT compiler restart into EventSweepCodeCache ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9334/files - new: https://git.openjdk.org/jdk/pull/9334/files/dbfb8775..29665ecb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=00-01 Stats: 36 lines in 5 files changed: 10 ins; 22 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9334.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9334/head:pull/9334 PR: https://git.openjdk.org/jdk/pull/9334 From mgronlun at openjdk.org Tue Jul 5 13:47:32 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 5 Jul 2022 13:47:32 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: <_iQiTh6jxuxMiyI35PbDr4Qlt64BHkIW3FiESlkqzO0=.ef7d1a75-d649-4c18-b00a-f358cfb50442@github.com> On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Ok, so there is a corresponding "compiler stopped", implicitly noted by firing EventCodeCacheFull? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From eosterlund at openjdk.org Tue Jul 5 13:53:46 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 5 Jul 2022 13:53:46 GMT Subject: RFR: 8286957: Held monitor count [v4] In-Reply-To: References: Message-ID: On Mon, 4 Jul 2022 11:05:26 GMT, Robbin Ehn wrote: >> The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. >> >> This change makes the counting exact by pushing the counting down in the abstraction. >> The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". >> >> An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. >> >> Fixed aarch64, x64, x86 and zero. >> >> Passes t1-8 > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into held-mon-count > - Fixed var name > - Merge branch 'master' into held-mon-count > - Merge branch 'master' into held-mon-count > - 8286957 - PR Baseline Looks good. We might however want the counter to be 64 bit so we don't have to think about overflows. I suppose nasty JNI code could lock the entire heap and then unlock it. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/8945 From mbaesken at openjdk.org Tue Jul 5 13:55:39 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 5 Jul 2022 13:55:39 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event In-Reply-To: <_iQiTh6jxuxMiyI35PbDr4Qlt64BHkIW3FiESlkqzO0=.ef7d1a75-d649-4c18-b00a-f358cfb50442@github.com> References: <_iQiTh6jxuxMiyI35PbDr4Qlt64BHkIW3FiESlkqzO0=.ef7d1a75-d649-4c18-b00a-f358cfb50442@github.com> Message-ID: On Tue, 5 Jul 2022 13:43:44 GMT, Markus Gr?nlund wrote: > Ok, so there is a corresponding "compiler stopped", implicitly noted by firing EventCodeCacheFull? Yes I think the EventCodeCacheFull (see CodeCache::report_codemem_full ) covers the JIT compiler stop pretty well. That's why I did not attempt to add a JIT stop JFR event because we have this already. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From kbarrett at openjdk.org Tue Jul 5 13:59:40 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 5 Jul 2022 13:59:40 GMT Subject: RFR: 8289739: Add G1 specific GC breakpoints for testing In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 11:35:19 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that adds a few G1 specific GC breakpoints for future testing [JDK-8289740](https://bugs.openjdk.org/browse/JDK-8289740). > > Testing: gha, local testing > > Thanks, > Thomas Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/9376 From rschmelter at openjdk.org Tue Jul 5 14:17:13 2022 From: rschmelter at openjdk.org (Ralf Schmelter) Date: Tue, 5 Jul 2022 14:17:13 GMT Subject: RFR: 8289745: JfrStructCopyFailed uses heap words instead of bytes for object sizes Message-ID: The values for smallestSize, firstSize and totalSize in the CopyFailed type are set as the number of heap words, but should be number of bytes. This leads to wrong values in the PromotionFailed and EvacuationFailed JFR events containing this type. ------------- Commit messages: - Convert size in heap words to bytes for CopyFailed JFR type Changes: https://git.openjdk.org/jdk/pull/9378/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9378&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289745 Stats: 14 lines in 4 files changed: 8 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/9378.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9378/head:pull/9378 PR: https://git.openjdk.org/jdk/pull/9378 From iklam at openjdk.org Tue Jul 5 16:47:27 2022 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 5 Jul 2022 16:47:27 GMT Subject: RFR: 8289710: Move Suspend/Resume classes out of os.hpp [v2] In-Reply-To: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> References: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Message-ID: > Please review this simple change that only renames a few classes and moved some code around. No functional changes. > > The following classes are used only sparingly. They should be moved to a new header file share/runtime/suspend.hpp to minimize the size of os.hpp > > - SuspendedThreadTaskContext > - SuspendedThreadTask > - SuspendResume > > I didn't move the OS-specific implementation to a new file -- the POSIX implementation is currently inside [signals_posix.cpp](https://github.com/openjdk/jdk/blob/df063f7db18a40ea7325fe608b3206a6dff812c1/src/hotspot/os/posix/signals_posix.cpp#L1790) mixed with other signal handling code, so it doesn't seem a good idea to move out just the code for the above 3 classes. > > The only other implementation is in os_windows.cpp. I could move the code to suspend_windows.cpp, but I don't feel very motivated unless someone insists. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: moved SuspendResume class to os/posix directory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9371/files - new: https://git.openjdk.org/jdk/pull/9371/files/2df83ec8..8265ecb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9371&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9371&range=00-01 Stats: 320 lines in 8 files changed: 187 ins; 130 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9371.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9371/head:pull/9371 PR: https://git.openjdk.org/jdk/pull/9371 From iklam at openjdk.org Tue Jul 5 16:51:25 2022 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 5 Jul 2022 16:51:25 GMT Subject: RFR: 8289710: Move Suspend/Resume classes out of os.hpp [v2] In-Reply-To: References: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Message-ID: On Tue, 5 Jul 2022 12:57:52 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> moved SuspendResume class to os/posix directory > > src/hotspot/share/runtime/suspend.cpp line 34: > >> 32: } >> 33: >> 34: #ifndef _WINDOWS > > I'm curious why this isn't needed on windows? With this ifndef this is really posix-only code and so should be in os/posix/suspend_posix.cpp, or at least os_posix.cpp. The `SuspendResume` class is used only inside signals_posix.cpp (which requires the field `SuspendResume OsThread::sr` to be declared in osThread_{aix, bsd, linux}.hpp). I moved this class into the os/posix directory. I also renamed suspend.hpp to suspendedThreadTask.hpp. ------------- PR: https://git.openjdk.org/jdk/pull/9371 From kvn at openjdk.org Tue Jul 5 18:07:20 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Jul 2022 18:07:20 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: <44P3YBpPj6uHZvDEsKIx01X4akPESr11IDxw8lGVW4o=.d6956298-d2e3-4079-b347-889780d6a58d@github.com> On Tue, 5 Jul 2022 13:47:31 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Incorporate JIT compiler restart into EventSweepCodeCache Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9334 From cjplummer at openjdk.org Tue Jul 5 17:45:24 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 5 Jul 2022 17:45:24 GMT Subject: RFR: 8289436: Make the redefine timer statistics more accurate In-Reply-To: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> References: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> Message-ID: <9vOGMA6gbtLaCeipmJgXrHjFWipHyzr61X1tahFlSOo=.0a1367bc-eeca-4549-9864-fb98576bbce6@github.com> On Wed, 29 Jun 2022 08:30:12 GMT, Tongbao Zhang wrote: > Make the redefine timer statistics more accurate > > After some significant performance improvements of the class redefinition, like: > https://bugs.openjdk.org/browse/JDK-8139551 > https://bugs.openjdk.org/browse/JDK-8078725 > > Some time-consumption operation were moved out the "redefine_single_class" > So the time added by phase 1 and phase 2 cannot be accurately represented to the time of "vmop_doit" Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9322 From tsteele at openjdk.org Tue Jul 5 17:46:00 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Tue, 5 Jul 2022 17:46:00 GMT Subject: [jdk19] RFR: 8288128: S390X: Fix crashes after JDK-8284161 (Virtual Threads) Message-ID: <6k2RcCL12vxkAOeWKadgithVNKDo2lVd-BI0xJPSR60=.ebb5f652-d658-4be9-a2cf-12ebafe371ac@github.com> This PR adapts the changes to the PPC port from JDK-8286446 and JDK-8288105 to s390 in order to fix the crashes preventing a successful `make images`. In addition, the following (minor) changes were made to the changesets above: - Change two instances of `own_abi()->lr` to `own_abi()->return_pc` (frame_s390.cpp:235,247) - Remove alignment assertion from `frame::setup` (frame_s390.inline.hpp:73) - Remove original_pc assertion from `frame::patch_pc` (frame_s390.cpp:251) - Add continuations_enabled guard to `generate_phase1` (stubGenerator_s390.cpp:2937) ------------- Commit messages: - Adapt and apply changes from JDK-8288105 to s390 - Adapt and apply changes from JDK-8286446 to s390. Changes: https://git.openjdk.org/jdk19/pull/110/files Webrev: https://webrevs.openjdk.org/?repo=jdk19&pr=110&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288128 Stats: 193 lines in 7 files changed: 111 ins; 41 del; 41 mod Patch: https://git.openjdk.org/jdk19/pull/110.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/110/head:pull/110 PR: https://git.openjdk.org/jdk19/pull/110 From duke at openjdk.org Tue Jul 5 17:54:33 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Tue, 5 Jul 2022 17:54:33 GMT Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls In-Reply-To: References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> Message-ID: On Fri, 17 Jun 2022 09:25:18 GMT, Andrew Haley wrote: >>> Based on your numbers (bytes saved / number of methods) I believe we're saving 16 bytes per method. >> >> How did you get 16? >> dotty arm64: $ 820544 / 4592 = 179 $ >> >>> How much more is there? What can we do with stubs besides duplicated static stubs removal? >> >> For arm64 we have: moving a pointer to metadata to a register and moving the address of the interpreter to a register. >> >> 0x0000ffff79bd2560: isb ; {static_stub} >> 0x0000ffff79bd2564: mov x12, #0x388 // #904 >> ; {metadata({method} {0x0000ffff18400388} 'error' '(ILjava/lang/String;)V' in 'Test')} >> 0x0000ffff79bd2568: movk x12, #0x1840, lsl #16 >> 0x0000ffff79bd256c: movk x12, #0xffff, lsl #32 >> 0x0000ffff79bd2570: mov x8, #0xe58c // #58764 >> 0x0000ffff79bd2574: movk x8, #0x793b, lsl #16 >> 0x0000ffff79bd2578: movk x8, #0xffff, lsl #32 >> 0x0000ffff79bd257c: br x8 >> >> If we never patch the branch to the interpreter, we can optimize it at link time either to a direct branch or an adrp based far jump. I also created https://bugs.openjdk.org/browse/JDK-8286142 to reduce metadata mov instructions. >> >>> Is it possible (theoretically) to move the stub out of the calling method to share it between methods? >> >> It is possible but it complicates CodeCache maintenance. Stubs use a pointer to metadata. When a class and methods are unloaded, we will need to invalidate all corresponding stubs. >> >> I can check with benchmarks how many stubs can shared among methods. > >> If we never patch the branch to the interpreter, we can optimize it at link time either to a direct branch or an adrp based far jump. I also created https://bugs.openjdk.org/browse/JDK-8286142 to reduce metadata mov instructions. > > If we emit the address of the interpreter once, at the start of the stub section, we can replace the branch to the interpreter with > `ldr rscratch1, adr; br rscratch1`. Hi Andrew(@theRealAph), Your comments are usually highly useful and help to identify missed issues. Do you have any of them? Thanks, Evgeny ------------- PR: https://git.openjdk.org/jdk/pull/8816 From sspitsyn at openjdk.org Tue Jul 5 17:24:28 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 5 Jul 2022 17:24:28 GMT Subject: RFR: 8289436: Make the redefine timer statistics more accurate In-Reply-To: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> References: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> Message-ID: On Wed, 29 Jun 2022 08:30:12 GMT, Tongbao Zhang wrote: > Make the redefine timer statistics more accurate > > After some significant performance improvements of the class redefinition, like: > https://bugs.openjdk.org/browse/JDK-8139551 > https://bugs.openjdk.org/browse/JDK-8078725 > > Some time-consumption operation were moved out the "redefine_single_class" > So the time added by phase 1 and phase 2 cannot be accurately represented to the time of "vmop_doit" This looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9322 From phh at openjdk.org Tue Jul 5 20:47:58 2022 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 5 Jul 2022 20:47:58 GMT Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls [v2] In-Reply-To: References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> Message-ID: <6-SqG67oF4oG2WCqTq-udI2aeXEyDPyw31po646cjt4=.6d0a11fd-f6fe-4906-95d4-b82ac14f5f66@github.com> On Wed, 29 Jun 2022 14:50:59 GMT, Evgeny Astigeevich wrote: >> ## Problem >> Calls of Java methods have stubs to the interpreter for the cases when an invoked Java method is not compiled. Calls of static Java methods and final Java methods have statically bound information about a callee during compilation. Such calls can share stubs to the interpreter. >> >> Each stub to the interpreter has a relocation record (accessed via `relocInfo`) which provides the address of the stub and the address of its owner. `relocInfo` has an offset which is an offset from the previously known relocatable address. The address of a stub is calculated as the address provided by the previous `relocInfo` plus the offset. >> >> Each Java call has: >> - A relocation for a call site. >> - A relocation for a stub to the interpreter. >> - A stub to the interpreter. >> - If far jumps are used (arm64 case): >> - A trampoline relocation. >> - A trampoline. >> >> We cannot avoid creating relocations. They are needed to support patching call sites. >> With shared stubs there will be multiple relocations having the same stub address but different owners' addresses. >> If we try to generate relocations as we go there will be a case which requires negative offsets: >> >> reloc1 ---> 0x0: stub1 >> reloc2 ---> 0x4: stub2 (reloc2.addr = reloc1.addr + reloc2.offset = 0x0 + 4) >> reloc3 ---> 0x0: stub1 (reloc3.addr = reloc2.addr + reloc3.offset = 0x4 - 4) >> >> >> `CodeSection` does not support negative offsets. It [assumes](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L195) addresses relocations pointing at grow upward. >> Negative offsets reduce the offset range by half. This can increase filler records, the empty `relocInfo` records to reduce offset values. Also negative offsets are only needed for `static_stub_type`, but other 13 types don?t need them. >> >> ## Solution >> In this PR creation of stubs is done in two stages. First we collect requests for creating shared stubs: a callee `ciMethod*` and an offset of a call in `CodeBuffer` (see [src/hotspot/share/asm/codeBuffer.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8ae)). Then we have the finalisation phase (see [src/hotspot/share/ci/ciEnv.cpp](https://github.com/openjdk/jdk/pull/8816/files#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450)), where `CodeBuffer::finalize_stubs()` creates shared stubs in `CodeBuffer`: a stub and multiple relocations sharing it. The first relocation will have positive offset. The rest will have zero offsets. This approach does not need negative offsets. As creation of relocations and stubs is platform dependent, `CodeBuffer::finalize_stubs()` calls `CodeBuffer::pd_finalize_stubs()` where platforms should put their code. >> >> This PR provides implementations for x86, x86_64 and aarch64. [src/hotspot/share/asm/codeBuffer.inline.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-c268e3719578f2980edaa27c0eacbe9f620124310108eb65d0f765212c7042eb) provides the `emit_shared_stubs_to_interp` template which x86, x86_64 and aarch64 platforms use. Other platforms can use it too. Platforms supporting shared stubs to the interpreter must have `CodeBuffer::supports_shared_stubs()` returning `true`. >> >> ## Results >> **Results from [Renaissance 0.14.0](https://github.com/renaissance-benchmarks/renaissance/releases/tag/v0.14.0)** >> Note: 'Nmethods with shared stubs' is the total number of nmethods counted during benchmark's run. 'Final # of nmethods' is a number of nmethods in CodeCache when JVM exited. >> - AArch64 >> >> +------------------+-------------+----------------------------+---------------------+ >> | Benchmark | Saved bytes | Nmethods with shared stubs | Final # of nmethods | >> +------------------+-------------+----------------------------+---------------------+ >> | dotty | 820544 | 4592 | 18872 | >> | dec-tree | 405280 | 2580 | 22335 | >> | naive-bayes | 392384 | 2586 | 21184 | >> | log-regression | 362208 | 2450 | 20325 | >> | als | 306048 | 2226 | 18161 | >> | finagle-chirper | 262304 | 2087 | 12675 | >> | movie-lens | 250112 | 1937 | 13617 | >> | gauss-mix | 173792 | 1262 | 10304 | >> | finagle-http | 164320 | 1392 | 11269 | >> | page-rank | 155424 | 1175 | 10330 | >> | chi-square | 140384 | 1028 | 9480 | >> | akka-uct | 115136 | 541 | 3941 | >> | reactors | 43264 | 335 | 2503 | >> | scala-stm-bench7 | 42656 | 326 | 3310 | >> | philosophers | 36576 | 256 | 2902 | >> | scala-doku | 35008 | 231 | 2695 | >> | rx-scrabble | 32416 | 273 | 2789 | >> | future-genetic | 29408 | 260 | 2339 | >> | scrabble | 27968 | 225 | 2477 | >> | par-mnemonics | 19584 | 168 | 1689 | >> | fj-kmeans | 19296 | 156 | 1647 | >> | scala-kmeans | 18080 | 140 | 1629 | >> | mnemonics | 17408 | 143 | 1512 | >> +------------------+-------------+----------------------------+---------------------+ >> >> - X86_64 >> >> +------------------+-------------+----------------------------+---------------------+ >> | Benchmark | Saved bytes | Nmethods with shared stubs | Final # of nmethods | >> +------------------+-------------+----------------------------+---------------------+ >> | dotty | 337065 | 4403 | 19135 | >> | dec-tree | 183045 | 2559 | 22071 | >> | naive-bayes | 176460 | 2450 | 19782 | >> | log-regression | 162555 | 2410 | 20648 | >> | als | 121275 | 1980 | 17179 | >> | movie-lens | 111915 | 1842 | 13020 | >> | finagle-chirper | 106350 | 1947 | 12726 | >> | gauss-mix | 81975 | 1251 | 10474 | >> | finagle-http | 80895 | 1523 | 12294 | >> | page-rank | 68940 | 1146 | 10124 | >> | chi-square | 62130 | 974 | 9315 | >> | akka-uct | 50220 | 555 | 4263 | >> | reactors | 23385 | 371 | 2544 | >> | philosophers | 17625 | 259 | 2865 | >> | scala-stm-bench7 | 17235 | 295 | 3230 | >> | scala-doku | 15600 | 214 | 2698 | >> | rx-scrabble | 14190 | 262 | 2770 | >> | future-genetic | 13155 | 253 | 2318 | >> | scrabble | 12300 | 217 | 2352 | >> | fj-kmeans | 8985 | 157 | 1616 | >> | par-mnemonics | 8535 | 155 | 1684 | >> | scala-kmeans | 8250 | 138 | 1624 | >> | mnemonics | 7485 | 134 | 1522 | >> +------------------+-------------+----------------------------+---------------------+ >> >> >> **Testing: fastdebug and release builds for x86, x86_64 and aarch64** >> - `tier1`...`tier4`: Passed >> - `hotspot/jtreg/compiler/sharedstubs`: Passed > > Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Merge branch 'master' into JDK-8280481C > - Use call offset instead of caller pc > - Simplify test > - Fix x86 build failures > - Remove UseSharedStubs and clarify shared stub use cases > - Make SharedStubToInterpRequest ResourceObj and set initial size of SharedStubToInterpRequests to 8 > - Update copyright year and add Unimplemented guards > - Set UseSharedStubs to true for X86 > - Set UseSharedStubs to true for AArch64 > - Fix x86 build failure > - ... and 10 more: https://git.openjdk.org/jdk/compare/eee4bf15...da3bfb5b Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.org/jdk/pull/8816 From duke at openjdk.org Tue Jul 5 20:53:39 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Tue, 5 Jul 2022 20:53:39 GMT Subject: Integrated: 8280481: Duplicated stubs to interpreter for static calls In-Reply-To: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> Message-ID: On Fri, 20 May 2022 16:27:51 GMT, Evgeny Astigeevich wrote: > ## Problem > Calls of Java methods have stubs to the interpreter for the cases when an invoked Java method is not compiled. Calls of static Java methods and final Java methods have statically bound information about a callee during compilation. Such calls can share stubs to the interpreter. > > Each stub to the interpreter has a relocation record (accessed via `relocInfo`) which provides the address of the stub and the address of its owner. `relocInfo` has an offset which is an offset from the previously known relocatable address. The address of a stub is calculated as the address provided by the previous `relocInfo` plus the offset. > > Each Java call has: > - A relocation for a call site. > - A relocation for a stub to the interpreter. > - A stub to the interpreter. > - If far jumps are used (arm64 case): > - A trampoline relocation. > - A trampoline. > > We cannot avoid creating relocations. They are needed to support patching call sites. > With shared stubs there will be multiple relocations having the same stub address but different owners' addresses. > If we try to generate relocations as we go there will be a case which requires negative offsets: > > reloc1 ---> 0x0: stub1 > reloc2 ---> 0x4: stub2 (reloc2.addr = reloc1.addr + reloc2.offset = 0x0 + 4) > reloc3 ---> 0x0: stub1 (reloc3.addr = reloc2.addr + reloc3.offset = 0x4 - 4) > > > `CodeSection` does not support negative offsets. It [assumes](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L195) addresses relocations pointing at grow upward. > Negative offsets reduce the offset range by half. This can increase filler records, the empty `relocInfo` records to reduce offset values. Also negative offsets are only needed for `static_stub_type`, but other 13 types don?t need them. > > ## Solution > In this PR creation of stubs is done in two stages. First we collect requests for creating shared stubs: a callee `ciMethod*` and an offset of a call in `CodeBuffer` (see [src/hotspot/share/asm/codeBuffer.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8ae)). Then we have the finalisation phase (see [src/hotspot/share/ci/ciEnv.cpp](https://github.com/openjdk/jdk/pull/8816/files#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450)), where `CodeBuffer::finalize_stubs()` creates shared stubs in `CodeBuffer`: a stub and multiple relocations sharing it. The first relocation will have positive offset. The rest will have zero offsets. This approach does not need negative offsets. As creation of relocations and stubs is platform dependent, `CodeBuffer::finalize_stubs()` calls `CodeBuffer::pd_finalize_stubs()` where platforms should put their code. > > This PR provides implementations for x86, x86_64 and aarch64. [src/hotspot/share/asm/codeBuffer.inline.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-c268e3719578f2980edaa27c0eacbe9f620124310108eb65d0f765212c7042eb) provides the `emit_shared_stubs_to_interp` template which x86, x86_64 and aarch64 platforms use. Other platforms can use it too. Platforms supporting shared stubs to the interpreter must have `CodeBuffer::supports_shared_stubs()` returning `true`. > > ## Results > **Results from [Renaissance 0.14.0](https://github.com/renaissance-benchmarks/renaissance/releases/tag/v0.14.0)** > Note: 'Nmethods with shared stubs' is the total number of nmethods counted during benchmark's run. 'Final # of nmethods' is a number of nmethods in CodeCache when JVM exited. > - AArch64 > > +------------------+-------------+----------------------------+---------------------+ > | Benchmark | Saved bytes | Nmethods with shared stubs | Final # of nmethods | > +------------------+-------------+----------------------------+---------------------+ > | dotty | 820544 | 4592 | 18872 | > | dec-tree | 405280 | 2580 | 22335 | > | naive-bayes | 392384 | 2586 | 21184 | > | log-regression | 362208 | 2450 | 20325 | > | als | 306048 | 2226 | 18161 | > | finagle-chirper | 262304 | 2087 | 12675 | > | movie-lens | 250112 | 1937 | 13617 | > | gauss-mix | 173792 | 1262 | 10304 | > | finagle-http | 164320 | 1392 | 11269 | > | page-rank | 155424 | 1175 | 10330 | > | chi-square | 140384 | 1028 | 9480 | > | akka-uct | 115136 | 541 | 3941 | > | reactors | 43264 | 335 | 2503 | > | scala-stm-bench7 | 42656 | 326 | 3310 | > | philosophers | 36576 | 256 | 2902 | > | scala-doku | 35008 | 231 | 2695 | > | rx-scrabble | 32416 | 273 | 2789 | > | future-genetic | 29408 | 260 | 2339 | > | scrabble | 27968 | 225 | 2477 | > | par-mnemonics | 19584 | 168 | 1689 | > | fj-kmeans | 19296 | 156 | 1647 | > | scala-kmeans | 18080 | 140 | 1629 | > | mnemonics | 17408 | 143 | 1512 | > +------------------+-------------+----------------------------+---------------------+ > > - X86_64 > > +------------------+-------------+----------------------------+---------------------+ > | Benchmark | Saved bytes | Nmethods with shared stubs | Final # of nmethods | > +------------------+-------------+----------------------------+---------------------+ > | dotty | 337065 | 4403 | 19135 | > | dec-tree | 183045 | 2559 | 22071 | > | naive-bayes | 176460 | 2450 | 19782 | > | log-regression | 162555 | 2410 | 20648 | > | als | 121275 | 1980 | 17179 | > | movie-lens | 111915 | 1842 | 13020 | > | finagle-chirper | 106350 | 1947 | 12726 | > | gauss-mix | 81975 | 1251 | 10474 | > | finagle-http | 80895 | 1523 | 12294 | > | page-rank | 68940 | 1146 | 10124 | > | chi-square | 62130 | 974 | 9315 | > | akka-uct | 50220 | 555 | 4263 | > | reactors | 23385 | 371 | 2544 | > | philosophers | 17625 | 259 | 2865 | > | scala-stm-bench7 | 17235 | 295 | 3230 | > | scala-doku | 15600 | 214 | 2698 | > | rx-scrabble | 14190 | 262 | 2770 | > | future-genetic | 13155 | 253 | 2318 | > | scrabble | 12300 | 217 | 2352 | > | fj-kmeans | 8985 | 157 | 1616 | > | par-mnemonics | 8535 | 155 | 1684 | > | scala-kmeans | 8250 | 138 | 1624 | > | mnemonics | 7485 | 134 | 1522 | > +------------------+-------------+----------------------------+---------------------+ > > > **Testing: fastdebug and release builds for x86, x86_64 and aarch64** > - `tier1`...`tier4`: Passed > - `hotspot/jtreg/compiler/sharedstubs`: Passed This pull request has now been integrated. Changeset: 35156041 Author: Evgeny Astigeevich Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/351560414d7ddc0694126ab184bdb78be604e51f Stats: 491 lines in 22 files changed: 458 ins; 5 del; 28 mod 8280481: Duplicated stubs to interpreter for static calls Reviewed-by: kvn, phh ------------- PR: https://git.openjdk.org/jdk/pull/8816 From mdoerr at openjdk.org Tue Jul 5 21:06:40 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 5 Jul 2022 21:06:40 GMT Subject: [jdk19] RFR: 8288128: S390X: Fix crashes after JDK-8284161 (Virtual Threads) In-Reply-To: <6k2RcCL12vxkAOeWKadgithVNKDo2lVd-BI0xJPSR60=.ebb5f652-d658-4be9-a2cf-12ebafe371ac@github.com> References: <6k2RcCL12vxkAOeWKadgithVNKDo2lVd-BI0xJPSR60=.ebb5f652-d658-4be9-a2cf-12ebafe371ac@github.com> Message-ID: On Tue, 5 Jul 2022 17:38:32 GMT, Tyler Steele wrote: > This PR adapts the changes to the PPC port from JDK-8286446 and JDK-8288105 to s390 in order to fix the crashes preventing a successful `make images`. > > In addition, the following (minor) changes were made to the changesets above: > - Change two instances of `own_abi()->lr` to `own_abi()->return_pc` (frame_s390.cpp:235,247) > - Remove alignment assertion from `frame::setup` (frame_s390.inline.hpp:73) > - Remove original_pc assertion from `frame::patch_pc` (frame_s390.cpp:251) > - Add continuations_enabled guard to `generate_phase1` (stubGenerator_s390.cpp:2937) Looks good! Thanks for porting it to s390! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk19/pull/110 From tsteele at openjdk.org Tue Jul 5 21:15:35 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Tue, 5 Jul 2022 21:15:35 GMT Subject: [jdk19] Integrated: 8288128: S390X: Fix crashes after JDK-8284161 (Virtual Threads) In-Reply-To: <6k2RcCL12vxkAOeWKadgithVNKDo2lVd-BI0xJPSR60=.ebb5f652-d658-4be9-a2cf-12ebafe371ac@github.com> References: <6k2RcCL12vxkAOeWKadgithVNKDo2lVd-BI0xJPSR60=.ebb5f652-d658-4be9-a2cf-12ebafe371ac@github.com> Message-ID: On Tue, 5 Jul 2022 17:38:32 GMT, Tyler Steele wrote: > This PR adapts the changes to the PPC port from JDK-8286446 and JDK-8288105 to s390 in order to fix the crashes preventing a successful `make images`. > > In addition, the following (minor) changes were made to the changesets above: > - Change two instances of `own_abi()->lr` to `own_abi()->return_pc` (frame_s390.cpp:235,247) > - Remove alignment assertion from `frame::setup` (frame_s390.inline.hpp:73) > - Remove original_pc assertion from `frame::patch_pc` (frame_s390.cpp:251) > - Add continuations_enabled guard to `generate_phase1` (stubGenerator_s390.cpp:2937) This pull request has now been integrated. Changeset: 0b6fd482 Author: Tyler Steele URL: https://git.openjdk.org/jdk19/commit/0b6fd4820c1f98d6154d7182345273a4c9468af5 Stats: 193 lines in 7 files changed: 111 ins; 41 del; 41 mod 8288128: S390X: Fix crashes after JDK-8284161 (Virtual Threads) Reviewed-by: mdoerr ------------- PR: https://git.openjdk.org/jdk19/pull/110 From tsteele at openjdk.org Tue Jul 5 21:15:34 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Tue, 5 Jul 2022 21:15:34 GMT Subject: [jdk19] RFR: 8288128: S390X: Fix crashes after JDK-8284161 (Virtual Threads) In-Reply-To: <6k2RcCL12vxkAOeWKadgithVNKDo2lVd-BI0xJPSR60=.ebb5f652-d658-4be9-a2cf-12ebafe371ac@github.com> References: <6k2RcCL12vxkAOeWKadgithVNKDo2lVd-BI0xJPSR60=.ebb5f652-d658-4be9-a2cf-12ebafe371ac@github.com> Message-ID: On Tue, 5 Jul 2022 17:38:32 GMT, Tyler Steele wrote: > This PR adapts the changes to the PPC port from JDK-8286446 and JDK-8288105 to s390 in order to fix the crashes preventing a successful `make images`. > > In addition, the following (minor) changes were made to the changesets above: > - Change two instances of `own_abi()->lr` to `own_abi()->return_pc` (frame_s390.cpp:235,247) > - Remove alignment assertion from `frame::setup` (frame_s390.inline.hpp:73) > - Remove original_pc assertion from `frame::patch_pc` (frame_s390.cpp:251) > - Add continuations_enabled guard to `generate_phase1` (stubGenerator_s390.cpp:2937) Thanks Martin :-) ------------- PR: https://git.openjdk.org/jdk19/pull/110 From lmesnik at openjdk.org Wed Jul 6 00:02:41 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 6 Jul 2022 00:02:41 GMT Subject: RFR: 8289436: Make the redefine timer statistics more accurate In-Reply-To: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> References: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> Message-ID: <2GPjX7-TT1PXfj6eTRxgX-Pt6RZHvKv8z5HOYaYY2NQ=.8b789522-69ef-4291-8ef1-91381802ff84@github.com> On Wed, 29 Jun 2022 08:30:12 GMT, Tongbao Zhang wrote: > Make the redefine timer statistics more accurate > > After some significant performance improvements of the class redefinition, like: > https://bugs.openjdk.org/browse/JDK-8139551 > https://bugs.openjdk.org/browse/JDK-8078725 > > Some time-consumption operation were moved out the "redefine_single_class" > So the time added by phase 1 and phase 2 cannot be accurately represented to the time of "vmop_doit" Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9322 From iklam at openjdk.org Wed Jul 6 01:07:04 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 6 Jul 2022 01:07:04 GMT Subject: RFR: 8289780: Remove Forte::register_stub Message-ID: I removed `Remove Forte::register_stub` since it's used only by the Solaris Forte(TM) Performance Tools collector, which is no longer supported by the JDK. I also fixed a couple of places where the stub name is computed unnecessarily. Also renamed some `#ifndef IA64` around the code that I touched. ------------- Commit messages: - 8289780: Remove Forte::register_stub Changes: https://git.openjdk.org/jdk/pull/9386/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9386&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289780 Stats: 112 lines in 8 files changed: 6 ins; 102 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9386.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9386/head:pull/9386 PR: https://git.openjdk.org/jdk/pull/9386 From jiefu at openjdk.org Wed Jul 6 02:24:13 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 6 Jul 2022 02:24:13 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 Message-ID: Hi all, ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. And `line` will be allocated by `getline` @line84. 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { 54 char* line_mountpoint = NULL; 55 char* line_filesystem = NULL; 56 57 // Parse line and return a newly allocated string containing the mount point if 58 // the line contains a matching filesystem and the mount point is accessible by 59 // the current user. 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || 61 strcmp(line_filesystem, filesystem) != 0 || 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { 63 // Not a matching or accessible filesystem 64 os::free(line_mountpoint); 65 line_mountpoint = NULL; 66 } 67 68 os::free(line_filesystem); 69 70 return line_mountpoint; 71 } 72 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); 75 if (fd == NULL) { 76 ZErrno err; 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); 78 return; 79 } 80 81 char* line = NULL; 82 size_t length = 0; 83 84 while (getline(&line, &length, fd) != -1) { 85 char* const mountpoint = get_mountpoint(line, filesystem); 86 if (mountpoint != NULL) { 87 mountpoints->append(mountpoint); 88 } 89 } 90 91 os::free(line); 92 fclose(fd); 93 } See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 That means we have raw `::malloc() -> os::free()`, which is unbalanced. Raw `::malloc()` does not write the header `os::free()` expects. If NMT is on, we assert now, because NMT does not find its header in os::free(). The fix just reverts `os::free()` to `::free()`. Testing: - hotspot/jtreg/gc/z on Linux/x64, all passed Thanks. Best regards, Jie ------------- Commit messages: - 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 Changes: https://git.openjdk.org/jdk/pull/9387/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9387&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289778 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9387.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9387/head:pull/9387 PR: https://git.openjdk.org/jdk/pull/9387 From mgronlun at openjdk.org Wed Jul 6 13:00:28 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 6 Jul 2022 13:00:28 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Tue, 5 Jul 2022 13:47:31 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Incorporate JIT compiler restart into EventSweepCodeCache Yes, thank you, I took a look at those fields too. Unfortunately, there is no value that bears directly on "current in-use". The reserved and committed will not be updated. If there was a value that could reflect the current usage, even if it means adding it to both EventCodeCacheFull and the JIT restart, that would be perfect. EventCodeCacheFull because in-use is x, JIT restart because in-use is now y. There are some statistics-related properties in the CodeHeap, like for example heap->unallocated_capacity(); et al. Could some of those expose this running value perhaps? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From dlong at openjdk.org Wed Jul 6 02:51:56 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Jul 2022 02:51:56 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 08:31:31 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with two additional commits since the last revision: > > - Add an "i2i" entry to enterSpecial > - Fix comment I like the new version. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From rpressler at openjdk.org Wed Jul 6 08:59:46 2022 From: rpressler at openjdk.org (Ron Pressler) Date: Wed, 6 Jul 2022 08:59:46 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 02:19:28 GMT, Dean Long wrote: >> Ron Pressler has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add an "i2i" entry to enterSpecial >> - Fix comment > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1058: > >> 1056: >> 1057: address mark = __ pc(); >> 1058: __ trampoline_call1(resolve, NULL, false); > > I don't think it's necessary to call the resolve stub when in interpreted mode. Can't we just call the Method's c2i adapter just like the interpreter would? I guess there might be a startup issue if the adapter hasn't been generated yet. I couldn't find code that does that and could be easily reused. > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1322: > >> 1320: >> 1321: __ pop(rax); // return address >> 1322: // Read interpreter arguments into registers (this is an ad-hoc i2c adapter) > > If I understand this correctly, this allows you to avoid creating an interpreted frame. Pretty clever! Yeah, it's a hand-rolled i2c adapter. I thought of just calling `gen_i2c_adapter` in place, but that would have required changing it. If we had two separate nmethods, we could rely on the standard i2c, but having two nmethods for a single Method didn't seem safe at this time. We can revisit later. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From rehn at openjdk.org Wed Jul 6 12:50:39 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Jul 2022 12:50:39 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v4] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 09:44:23 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with one additional commit since the last revision: > > Changes following review comments Marked as reviewed by rehn (Reviewer). ------------- PR: https://git.openjdk.org/jdk19/pull/66 From tschatzl at openjdk.org Wed Jul 6 09:15:28 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 6 Jul 2022 09:15:28 GMT Subject: RFR: 8289739: Add G1 specific GC breakpoints for testing In-Reply-To: <7AwZiEMEyxFdoAvUPCHSEjDKyS3f6JFJZYLFHen47-A=.a7589631-44ea-419b-bc23-20a5a75a86b5@github.com> References: <7AwZiEMEyxFdoAvUPCHSEjDKyS3f6JFJZYLFHen47-A=.a7589631-44ea-419b-bc23-20a5a75a86b5@github.com> Message-ID: On Wed, 6 Jul 2022 07:51:34 GMT, Albert Mingkun Yang wrote: > It's unclear to me why "BEFORE REBUILD COMPLETED" is placed inside `phase_cleanup`; a more natural place to me is at the end of `G1ConcurrentMarkThread::phase_rebuild_remembered_sets`. (The same goes for the other pair of breakpoints.) (Concurrent) rebuild completes before the cleanup pause. This follows the existing scheme, e.g. in `subphase_remark()` there is the `BEFORE MARKING COMPLETED` breakpoint before the remark pause (which completes the marking). The Cleanup pause is also a kind of extension of the rebuild remset phase as it acts on actually this information (and it has not been "cleaning up" anything relevant for a long time since JDK 11 iirc). This seems to be a separate, pre-existing issue. > > Sort of a preexisting issue but this change builds on top of it: "cleanup" can mean both the "cleanup pause" (`phase_cleanup`) or the "concurrent bitmap clearing" (`phase_clear_bitmap_for_next_mark`). IMO, it's better to get rid of such overloading. The method names are different, one is called `cleanup` and the other `clear_bitmap_for_next_mark` phase for this reason. Maybe you suggest to rename the "Cleanup" pause and/or the "cleanup" use for the concurrent phase in the remaining code? This renaming seems to be a separate (pre-existing) issue and related to above. > > (Not specific to this change. I also find the breakpoints naming pattern, `before-X-started & after-X-completed`, rather odd -- since they are break**points**, names reflecting a particular point in time, instead of a period, would have been more accurate, sth like `at-X-start & at-X-end`.) The "at" seems to be more specific and could be changed separately and seems to be pre-existing. ------------- PR: https://git.openjdk.org/jdk/pull/9376 From chagedorn at openjdk.org Wed Jul 6 07:28:01 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Jul 2022 07:28:01 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v13] In-Reply-To: References: Message-ID: > When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method: > > Stack: [0x00007f6e01739000,0x00007f6e0183a000], sp=0x00007f6e01838110, free space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 > V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec > V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 > V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df > V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 > V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d > V [libjvm.so+0x12091c9] JavaThread::run()+0x167 > V [libjvm.so+0x1206ada] Thread::call_run()+0x180 > V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f > > This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method. > > This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)): > > Stack: [0x00007f34fca18000,0x00007f34fcb19000], sp=0x00007f34fcb17110, free space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 (c1_Compilation.cpp:607) > V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec (c1_Compiler.cpp:250) > V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 (compileBroker.cpp:2291) > V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df (compileBroker.cpp:1966) > V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 (compilerThread.cpp:59) > V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d (thread.cpp:1297) > V [libjvm.so+0x12091c9] JavaThread::run()+0x167 (thread.cpp:1280) > V [libjvm.so+0x1206ada] Thread::call_run()+0x180 (thread.cpp:358) > V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f (os_linux.cpp:705) > > For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4. > > The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf > I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec. > > The comments at the `Dwarf` class in the `elf.hpp` file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the `elf.hpp` file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability. > > Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, `-Xlog:dwarf` can be used with `info`, `debug` or `trace` which provides logging messages throughout parsing. > > **Testing:** > Apart from manual testing, I've added two kinds of tests: > - A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers. > - Gtests: Directly calling the `get_source()` method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename. > > On top of that, there are also existing JTreg tests that call `-XX:NativeMemoryTracking=detail` which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation. > > To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional `_JVM_DWARF_PATH` environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some `make` changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number. > > I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user). > > Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for `make`! > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 69 commits: - Merge branch 'master' into JDK-8242181 - Exclude TestDwarf.java when run with product because TraceDwarf is a develop flag - Merge branch 'master' into JDK-8242181 - Merge branch 'master' into JDK-8242181 - Fix TestDwarf for older GCC versions - Change logging from UL to tty based with new TraceDwarfLevel develop flag - Add support to parse the .debug_line section in DWARF 2 as emitted by GCC 8, add some comments - Merge branch 'master' into JDK-8242181 - Merge branch 'master' into JDK-8242181 - Merge branch 'master' into JDK-8242181 - ... and 59 more: https://git.openjdk.org/jdk/compare/d8f4e97b...33c924ef ------------- Changes: https://git.openjdk.org/jdk/pull/7126/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=7126&range=12 Stats: 2781 lines in 18 files changed: 2684 ins; 41 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/7126.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7126/head:pull/7126 PR: https://git.openjdk.org/jdk/pull/7126 From dlong at openjdk.org Wed Jul 6 02:22:48 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Jul 2022 02:22:48 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 08:31:31 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with two additional commits since the last revision: > > - Add an "i2i" entry to enterSpecial > - Fix comment src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1058: > 1056: > 1057: address mark = __ pc(); > 1058: __ trampoline_call1(resolve, NULL, false); I don't think it's necessary to call the resolve stub when in interpreted mode. Can't we just call the Method's c2i adapter just like the interpreter would? I guess there might be a startup issue if the adapter hasn't been generated yet. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From coleenp at openjdk.org Wed Jul 6 13:14:40 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Jul 2022 13:14:40 GMT Subject: RFR: 8289710: Move Suspend/Resume classes out of os.hpp [v2] In-Reply-To: References: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Message-ID: On Tue, 5 Jul 2022 16:47:27 GMT, Ioi Lam wrote: >> Please review this simple change that only renames a few classes and moved some code around. No functional changes. >> >> The following classes are used only sparingly. They should be moved to a new header file share/runtime/suspend.hpp to minimize the size of os.hpp >> >> - SuspendedThreadTaskContext >> - SuspendedThreadTask >> - SuspendResume >> >> I didn't move the OS-specific implementation to a new file -- the POSIX implementation is currently inside [signals_posix.cpp](https://github.com/openjdk/jdk/blob/df063f7db18a40ea7325fe608b3206a6dff812c1/src/hotspot/os/posix/signals_posix.cpp#L1790) mixed with other signal handling code, so it doesn't seem a good idea to move out just the code for the above 3 classes. >> >> The only other implementation is in os_windows.cpp. I could move the code to suspend_windows.cpp, but I don't feel very motivated unless someone insists. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > moved SuspendResume class to os/posix directory The movement out of class os looks like an improvement to me. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/9371 From david.holmes at oracle.com Wed Jul 6 12:00:27 2022 From: david.holmes at oracle.com (David Holmes) Date: Wed, 6 Jul 2022 22:00:27 +1000 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> <21506449-753B-4483-B10C-8C5991999BD8@oracle.com> Message-ID: On 6/07/2022 11:52 am, Ian Rogers wrote: > An old "won't fix" bug (JDK-8199919 : Deprecate JNI critical APIs): > https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8199919 > and thread: > https://mail.openjdk.org/pipermail/core-libs-dev/2018-March/052153.html > > I've not tracked recent JDK changes so I'd be interested to know if > -Xcheck:jni can now meaningfully be turned on for development code. > Please obsolete/deprecate JNI criticals :-) This thread is not about the JNI GetPrimitiveArrayCritical (and other) API's. I'm not clear if you are just flagging a general concern about any kind of API that might introduce TTSP issues? Cheers, David > Thanks, > Ian > > On Tue, Jul 5, 2022 at 3:05 AM Erik Osterlund wrote: >> >> Hi, >> >> Here is a clarification on the ZGC interactions. >> >> The initial form of JNI critical native calls was implemented as an internal thing for SPARC crypto libraries, private to the JDK. JNI calls on SPARC involved flushing register windows, which was actually rather slow. >> >> This form came with a mechanism for lazily activating the GC locker for primitive arrays that the crypto code needed direct access to. This essentially deferred invoking the GC locker from the Java thread to the safepoint synchronizer. >> >> The problematic aspect for generational ZGC was the async GC locker interactions. Its implication is that each GC safepoint might fail, because the GC locker can?t be locked out before the safepoint is synchronized, so you end up instead trying to lock it inside GC safepoints, only to find that you couldn?t. >> >> The failed GC safepoints lead to GC opertions instead being started asynchronously from the GC locker. That was easier to deal with for the mainline version of ZGC since there was only one type of GC: full GCs. So we coped. >> >> With generational ZGC, the asynchronous operation has to figure out if it should poke the minor (young) and/or major (young + old) GC drivers. That problem is not easy to solve. However with JNI critical natives gone, the entire GC locker for ZGC is just a simple readers writer lock, where critical native functions use the readers lock and the GC operations use the writer lock. The GC safepoints can?t fail. >> >> With the new implementation that avoids doing a transition to native at all, the mentioned problem no longer occurs, as the safepoint synchronizer won?t allow safepoints to creep in right in the middle of all this. So it would seem we are okay with that. So I think as long as we don?t go with the previous async GC locker solution, we can remove ZGC interactions from the equation. >> >> However, you obviously instead get a trust problem instead with this flavour of cheating the system. Anything that takes a long ish time in a critical native function without a native transition, is going to be a disaster and hang the entire JVM. That is typically something we do not take lightly and is indeed why we have native transitions. >> >> So I would be delighted if we didn?t resurrect ways of cheating the system anyway, unless this is absolutely? critical. It took a long time to get rid of the cheats. >> >> /Erik >> >> On 4 Jul 2022, at 18:07, Maurizio Cimadamore wrote: >> >> ? >> >> Hi, >> while I'm not an expert with some of the IO calls you mention (some of my colleagues are more knowledgeable in this area, so I'm sure they will have more info), my general sense is that, as with getrusage, if there is a system call involved, you already pay a hefty price for the user to kernel transition. On my machine this seem to cost around 200ns. In these cases, using JNI critical to shave off a dozen of nanoseconds (at best!) seems just not worth it. >> >> So, of the functions in your list, the ones in which I *believe* dropping transitions would have the most effect are (if we exclude getpid, for which another approach is possible) clock_gettime and getcpu, I believe, as they might use vdso [1], which typically brings the performance of these call closer to calls to shared lib functions. >> >> If you have examples e.g. where performance of recvmsg (or related calls) varies significantly between base JNI and critical JNI, please send them our way; I'm sure some of my colleagues would be intersted to take a look. >> >> Popping back a couple of levels, I think it would be helpful to also define what's an acceptable regression in this context. Of course, in an ideal world, we'd like to see no performance regression at all. But JNI critical is an unsupported interface, which might misbehave with modern garbage collectors (e.g. ZGC) and that requires quite a bit of internal complexity which might, in the medium/long run, hinder the evolution of the Java platform (all these things have _some_ cost, even if the cost is not directly material to developers). In this vein, I think calls like clock_gettime tend to be more problematic: as they complete very quickly, you see the cost of transitions a lot more. In other cases, where syscalls are involved, the cost associated to transitions are more likely to be "in the noise". Of course if we look at absolute numbers, dropping transitions would always yield "faster" code; but at the same time, going from 250ns to 245ns is very unlikely to result in visible performance difference when considering an application as a whole, so I think it's critical here to decide _which_ use cases to prioritize. >> >> I think a good outcome of this discussion would be if we could come to some shared understanding of which native calls are truly problematic (e.g. clock_gettime-like), and then for the JDK to provide better (and more maintainable) alternatives for those (which might even be faster than using critical JNI). >> >> Thanks >> Maurizio >> >> [1] - https://man7.org/linux/man-pages/man7/vdso.7.html >> >> On 04/07/2022 12:23, Wojciech Kudla wrote: >> >> Thanks Maurizio, >> >> I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think we're focusing on the wrong things here. Feel free to drop the two syscalls from the discussion entirely, but the main usecases I have been presenting throughout this thread definitely stand. >> >> Thanks >> >> >> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore wrote: >>> >>> Hi Wojtek, >>> thanks for sharing this list, I think this is a good starting point to understand more about your use case. >>> >>> Last week I've been looking at "getrusage" (as you mentioned it in an earlier email), and I was surprised to see that the call took a pointer to a (fairly big) struct which then needed to be initialized with some thread-local state: >>> >>> https://man7.org/linux/man-pages/man2/getrusage.2.html >>> >>> I've looked at the implementation, and it seems to be doing memset on the user-provided struct pointer, plus all the fields assignment. Eyeballing the implementation, this does not seem to me like a "classic" use case where dropping transition would help much. I mean, surely dropping transitions would help shaving some nanoseconds off the call, but it doesn't seem to me that the call would be shortlived enough to make a difference. Do you have some benchmarks on this one? I did some [1] and the call overhead seemed to come up at 260ns/op - w/o transition you might perhaps be able to get to 250ns, but that's in the noise? >>> >>> As for getpid, note that you can do (since Java 9): >>> >>> ProcessHandle.current().pid(); >>> >>> I believe the impl caches the result, so it shouldn't even make the native call. >>> >>> Maurizio >>> >>> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >>> >>> On 02/07/2022 07:42, Wojciech Kudla wrote: >>> >>> Hi Maurizio, >>> >>> Thanks for staying on this. >>> >>>> Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? >>> >>> From the top of my head: >>> clock_gettime >>> recvmsg >>> recvmmsg >>> sendmsg >>> sendmmsg >>> select >>> getpid >>> getcpu >>> getrusage >>> >>>> Also, could you please tell us whether any of these calls need to interact with Java arrays? >>> No arrays or objects of any type involved. Everything happens by the means of passing raw pointers as longs and using other primitive types as function arguments. >>> >>>> In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? >>> Criticial JNI natives are used solely to remove the cost of transitions. We don't get anywhere near java heap in native code. >>> >>> In general I think it makes a lot of sense for Java as a language/platform to have some guards around unsafe code, but on the other hand the popularity of libraries employing Unsafe and their success in more performance-oriented corners of software engineering is a clear indicator there is a need for the JVM to provide access to more low-level primitives and mechanisms. >>> I think it's entirely fair to tell developers that all bets are off when they get into some non-idiomatic scenarios but please don't take away a feature that greatly contributed to Java's success. >>> >>> Kind regards, >>> Wojtek >>> >>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore wrote: >>>> >>>> Hi Wojciech, >>>> picking up this thread again. After some internal discussion, we realize that we don't know enough about your use case. While re-enabling JNI critical would obviously provide a quick fix, we're afraid that (a) developers might end up depending on JNI critical when they don't need to (perhaps also unaware of the consequences of depending on it) and (b) that there might actually be _better_ (as in: much faster) solutions than using critical native calls to address at least some of your use cases (that seemed to be the case with the clock_gettime example you mentioned). Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? Also, could you please tell us whether any of these calls need to interact with Java arrays? In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? >>>> >>>> Regards >>>> Maurizio >>>> >>>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>> >>>> Hi Mark, >>>> >>>> Thanks for your input and apologies for the delayed response. >>>> >>>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> Exposing realtime clock with nanosecond granularity in the JDK would be a great step forward. I should have made it clear that I represent fintech corner (investment banking to be exact) but the issues my message touches upon span areas such as HPC, audio processing, gaming, and defense industry so it's not like we have an isolated case. >>>> >>>>> In a similar vein, if people are finding it necessary to ?replace parts >>>> of NIO with hand-crafted native code? then it would be interesting to >>>> understand what their requirements are >>>> >>>> As for the other example I provided with making very short lived syscalls such as recvmsg/recvmmsg the premise is getting access to hardware timestamps on the ingress and egress ends as well as enabling batch receive with a single syscall and otherwise exploiting features unavailable from the JDK (like access to CMSG interface, scatter/gather, etc). >>>> There are also other examples of calls that we'd love to make often and at lowest possible cost (ie. getrusage) but I'm not sure if there's a strong case for some of these ideas, that's why it might be worth looking into more generic approach for performance sensitive code. >>>> Hope this does better job at explaining where we're coming from than my previous messages. >>>> >>>> Thanks, >>>> W >>>> >>>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>>> >>>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>>>>> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>>>>> CLOCK_REALTIME. >>>>>> >>>>>> Unfortunately System.currentTimeMillis() offers only millisecond >>>>>> granularity which is the reason why our industry has to resort to >>>>>> clock_gettime. >>>>> >>>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>>> that help developers in your unnamed industry? >>>>> >>>>> In a similar vein, if people are finding it necessary to ?replace parts >>>>> of NIO with hand-crafted native code? then it would be interesting to >>>>> understand what their requirements are. Some simple enhancements to >>>>> the NIO API would be much less costly to design and implement than a >>>>> generalized user-level native-call intrinsification mechanism. >>>>> >>>>> - Mark From ayang at openjdk.org Wed Jul 6 07:54:26 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 6 Jul 2022 07:54:26 GMT Subject: RFR: 8289739: Add G1 specific GC breakpoints for testing In-Reply-To: References: Message-ID: <7AwZiEMEyxFdoAvUPCHSEjDKyS3f6JFJZYLFHen47-A=.a7589631-44ea-419b-bc23-20a5a75a86b5@github.com> On Tue, 5 Jul 2022 11:35:19 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that adds a few G1 specific GC breakpoints for future testing [JDK-8289740](https://bugs.openjdk.org/browse/JDK-8289740). > > Testing: gha, local testing > > Thanks, > Thomas It's unclear to me why "BEFORE REBUILD COMPLETED" is placed inside `phase_cleanup`; a more natural place to me is at the end of `G1ConcurrentMarkThread::phase_rebuild_remembered_sets`. (The same goes for the other pair of breakpoints.) Sort of a preexisting issue but this change builds on top of it: "cleanup" can mean both the "cleanup pause" (`phase_cleanup`) or the "concurrent bitmap clearing" (`phase_clear_bitmap_for_next_mark`). IMO, it's better to get rid of such overloading. (Not specific to this change. I also find the breakpoints naming pattern, `before-X-started & after-X-completed`, rather odd -- since they are break**points**, names reflecting a particular point in time, instead of a period, would have been more accurate, sth like `at-X-start & at-X-end`.) ------------- PR: https://git.openjdk.org/jdk/pull/9376 From irogers at google.com Wed Jul 6 01:52:01 2022 From: irogers at google.com (Ian Rogers) Date: Tue, 5 Jul 2022 18:52:01 -0700 Subject: Obsoleting JavaCritical In-Reply-To: <21506449-753B-4483-B10C-8C5991999BD8@oracle.com> References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> <21506449-753B-4483-B10C-8C5991999BD8@oracle.com> Message-ID: An old "won't fix" bug (JDK-8199919 : Deprecate JNI critical APIs): https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8199919 and thread: https://mail.openjdk.org/pipermail/core-libs-dev/2018-March/052153.html I've not tracked recent JDK changes so I'd be interested to know if -Xcheck:jni can now meaningfully be turned on for development code. Please obsolete/deprecate JNI criticals :-) Thanks, Ian On Tue, Jul 5, 2022 at 3:05 AM Erik Osterlund wrote: > > Hi, > > Here is a clarification on the ZGC interactions. > > The initial form of JNI critical native calls was implemented as an internal thing for SPARC crypto libraries, private to the JDK. JNI calls on SPARC involved flushing register windows, which was actually rather slow. > > This form came with a mechanism for lazily activating the GC locker for primitive arrays that the crypto code needed direct access to. This essentially deferred invoking the GC locker from the Java thread to the safepoint synchronizer. > > The problematic aspect for generational ZGC was the async GC locker interactions. Its implication is that each GC safepoint might fail, because the GC locker can?t be locked out before the safepoint is synchronized, so you end up instead trying to lock it inside GC safepoints, only to find that you couldn?t. > > The failed GC safepoints lead to GC opertions instead being started asynchronously from the GC locker. That was easier to deal with for the mainline version of ZGC since there was only one type of GC: full GCs. So we coped. > > With generational ZGC, the asynchronous operation has to figure out if it should poke the minor (young) and/or major (young + old) GC drivers. That problem is not easy to solve. However with JNI critical natives gone, the entire GC locker for ZGC is just a simple readers writer lock, where critical native functions use the readers lock and the GC operations use the writer lock. The GC safepoints can?t fail. > > With the new implementation that avoids doing a transition to native at all, the mentioned problem no longer occurs, as the safepoint synchronizer won?t allow safepoints to creep in right in the middle of all this. So it would seem we are okay with that. So I think as long as we don?t go with the previous async GC locker solution, we can remove ZGC interactions from the equation. > > However, you obviously instead get a trust problem instead with this flavour of cheating the system. Anything that takes a long ish time in a critical native function without a native transition, is going to be a disaster and hang the entire JVM. That is typically something we do not take lightly and is indeed why we have native transitions. > > So I would be delighted if we didn?t resurrect ways of cheating the system anyway, unless this is absolutely? critical. It took a long time to get rid of the cheats. > > /Erik > > On 4 Jul 2022, at 18:07, Maurizio Cimadamore wrote: > > ? > > Hi, > while I'm not an expert with some of the IO calls you mention (some of my colleagues are more knowledgeable in this area, so I'm sure they will have more info), my general sense is that, as with getrusage, if there is a system call involved, you already pay a hefty price for the user to kernel transition. On my machine this seem to cost around 200ns. In these cases, using JNI critical to shave off a dozen of nanoseconds (at best!) seems just not worth it. > > So, of the functions in your list, the ones in which I *believe* dropping transitions would have the most effect are (if we exclude getpid, for which another approach is possible) clock_gettime and getcpu, I believe, as they might use vdso [1], which typically brings the performance of these call closer to calls to shared lib functions. > > If you have examples e.g. where performance of recvmsg (or related calls) varies significantly between base JNI and critical JNI, please send them our way; I'm sure some of my colleagues would be intersted to take a look. > > Popping back a couple of levels, I think it would be helpful to also define what's an acceptable regression in this context. Of course, in an ideal world, we'd like to see no performance regression at all. But JNI critical is an unsupported interface, which might misbehave with modern garbage collectors (e.g. ZGC) and that requires quite a bit of internal complexity which might, in the medium/long run, hinder the evolution of the Java platform (all these things have _some_ cost, even if the cost is not directly material to developers). In this vein, I think calls like clock_gettime tend to be more problematic: as they complete very quickly, you see the cost of transitions a lot more. In other cases, where syscalls are involved, the cost associated to transitions are more likely to be "in the noise". Of course if we look at absolute numbers, dropping transitions would always yield "faster" code; but at the same time, going from 250ns to 245ns is very unlikely to result in visible performance difference when considering an application as a whole, so I think it's critical here to decide _which_ use cases to prioritize. > > I think a good outcome of this discussion would be if we could come to some shared understanding of which native calls are truly problematic (e.g. clock_gettime-like), and then for the JDK to provide better (and more maintainable) alternatives for those (which might even be faster than using critical JNI). > > Thanks > Maurizio > > [1] - https://man7.org/linux/man-pages/man7/vdso.7.html > > On 04/07/2022 12:23, Wojciech Kudla wrote: > > Thanks Maurizio, > > I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think we're focusing on the wrong things here. Feel free to drop the two syscalls from the discussion entirely, but the main usecases I have been presenting throughout this thread definitely stand. > > Thanks > > > On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore wrote: >> >> Hi Wojtek, >> thanks for sharing this list, I think this is a good starting point to understand more about your use case. >> >> Last week I've been looking at "getrusage" (as you mentioned it in an earlier email), and I was surprised to see that the call took a pointer to a (fairly big) struct which then needed to be initialized with some thread-local state: >> >> https://man7.org/linux/man-pages/man2/getrusage.2.html >> >> I've looked at the implementation, and it seems to be doing memset on the user-provided struct pointer, plus all the fields assignment. Eyeballing the implementation, this does not seem to me like a "classic" use case where dropping transition would help much. I mean, surely dropping transitions would help shaving some nanoseconds off the call, but it doesn't seem to me that the call would be shortlived enough to make a difference. Do you have some benchmarks on this one? I did some [1] and the call overhead seemed to come up at 260ns/op - w/o transition you might perhaps be able to get to 250ns, but that's in the noise? >> >> As for getpid, note that you can do (since Java 9): >> >> ProcessHandle.current().pid(); >> >> I believe the impl caches the result, so it shouldn't even make the native call. >> >> Maurizio >> >> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >> >> On 02/07/2022 07:42, Wojciech Kudla wrote: >> >> Hi Maurizio, >> >> Thanks for staying on this. >> >> > Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? >> >> From the top of my head: >> clock_gettime >> recvmsg >> recvmmsg >> sendmsg >> sendmmsg >> select >> getpid >> getcpu >> getrusage >> >> > Also, could you please tell us whether any of these calls need to interact with Java arrays? >> No arrays or objects of any type involved. Everything happens by the means of passing raw pointers as longs and using other primitive types as function arguments. >> >> > In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? >> Criticial JNI natives are used solely to remove the cost of transitions. We don't get anywhere near java heap in native code. >> >> In general I think it makes a lot of sense for Java as a language/platform to have some guards around unsafe code, but on the other hand the popularity of libraries employing Unsafe and their success in more performance-oriented corners of software engineering is a clear indicator there is a need for the JVM to provide access to more low-level primitives and mechanisms. >> I think it's entirely fair to tell developers that all bets are off when they get into some non-idiomatic scenarios but please don't take away a feature that greatly contributed to Java's success. >> >> Kind regards, >> Wojtek >> >> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore wrote: >>> >>> Hi Wojciech, >>> picking up this thread again. After some internal discussion, we realize that we don't know enough about your use case. While re-enabling JNI critical would obviously provide a quick fix, we're afraid that (a) developers might end up depending on JNI critical when they don't need to (perhaps also unaware of the consequences of depending on it) and (b) that there might actually be _better_ (as in: much faster) solutions than using critical native calls to address at least some of your use cases (that seemed to be the case with the clock_gettime example you mentioned). Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? Also, could you please tell us whether any of these calls need to interact with Java arrays? In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? >>> >>> Regards >>> Maurizio >>> >>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>> >>> Hi Mark, >>> >>> Thanks for your input and apologies for the delayed response. >>> >>> > If the platform included, say, an intrinsified System.nanoRealTime() >>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>> that help developers in your unnamed industry? >>> >>> Exposing realtime clock with nanosecond granularity in the JDK would be a great step forward. I should have made it clear that I represent fintech corner (investment banking to be exact) but the issues my message touches upon span areas such as HPC, audio processing, gaming, and defense industry so it's not like we have an isolated case. >>> >>> > In a similar vein, if people are finding it necessary to ?replace parts >>> of NIO with hand-crafted native code? then it would be interesting to >>> understand what their requirements are >>> >>> As for the other example I provided with making very short lived syscalls such as recvmsg/recvmmsg the premise is getting access to hardware timestamps on the ingress and egress ends as well as enabling batch receive with a single syscall and otherwise exploiting features unavailable from the JDK (like access to CMSG interface, scatter/gather, etc). >>> There are also other examples of calls that we'd love to make often and at lowest possible cost (ie. getrusage) but I'm not sure if there's a strong case for some of these ideas, that's why it might be worth looking into more generic approach for performance sensitive code. >>> Hope this does better job at explaining where we're coming from than my previous messages. >>> >>> Thanks, >>> W >>> >>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>> >>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>> >> CLOCK_REALTIME. >>>> > >>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>> > granularity which is the reason why our industry has to resort to >>>> > clock_gettime. >>>> >>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> In a similar vein, if people are finding it necessary to ?replace parts >>>> of NIO with hand-crafted native code? then it would be interesting to >>>> understand what their requirements are. Some simple enhancements to >>>> the NIO API would be much less costly to design and implement than a >>>> generalized user-level native-call intrinsification mechanism. >>>> >>>> - Mark From stuefe at openjdk.org Wed Jul 6 13:40:33 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 6 Jul 2022 13:40:33 GMT Subject: RFR: 8289745: JfrStructCopyFailed uses heap words instead of bytes for object sizes In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 14:08:24 GMT, Ralf Schmelter wrote: > The values for smallestSize, firstSize and totalSize in the CopyFailed type are set as the number of heap words, but should be number of bytes. This leads to wrong values in the PromotionFailed and EvacuationFailed JFR events containing this type. Good catch. IIUC this issue is a day zero bug, right? Bit scary that this was not found before. test/jdk/jdk/jfr/event/gc/detailed/PromotionFailedEvent.java line 56: > 54: System.out.println("Event: " + event); > 55: long smallestSize = Events.assertField(event, "promotionFailed.smallestSize").atLeast(1L).getValue(); > 56: Asserts.assertTrue((smallestSize % minObjectAlignment) == 0, "smallestSize " + smallestSize + " is not a valid size."); Testing for alignment is a good pragmatic way to check for regressions without adding more logic. Do the numbers include object headers? If yes, we could assert to >= 8 at least. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/9378 From mgronlun at openjdk.org Wed Jul 6 11:42:27 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 6 Jul 2022 11:42:27 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Tue, 5 Jul 2022 13:47:31 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Incorporate JIT compiler restart into EventSweepCodeCache Hi again Mathias, based on your observation that this does not really align well with the threshold parameter for SweepCodeCache and is a complement event to EventCodeCacheFull, I believe your original suggestion is better (an instant event, only for "jit restart"). I was slightly tripped up with the concept of a "JIT restart", because at this point the JIT is already stopped, as implied by EventCodeCacheFull. So "JIT restart" here contextually means, "JIT start", as in "start JITting code again". Do you know if we have an event that describes the CodeCache settings in terms of memory sizes so that the "freed memory" can be interpreted relative to it? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From lucy at openjdk.org Wed Jul 6 14:43:28 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 6 Jul 2022 14:43:28 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Tue, 5 Jul 2022 13:47:31 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Incorporate JIT compiler restart into EventSweepCodeCache Changes look good to me. To get an idea of how much memory was freed in relation to what's actually available, you could evaluate CodeCache:max_capacity(). This function returns the overall size of the code heap (across all segments). The performance impact is minimal and no locks are acquired. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/9334 From jwilhelm at openjdk.org Wed Jul 6 13:11:34 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Wed, 6 Jul 2022 13:11:34 GMT Subject: RFR: Merge jdk19 Message-ID: Forwardport JDK 19 -> JDK 20 ------------- Commit messages: - Merge - 8289477: Memory corruption with CPU_ALLOC, CPU_FREE on muslc - 8289439: Clarify relationship between ThreadStart/ThreadEnd and can_support_virtual_threads capability - 8288128: S390X: Fix crashes after JDK-8284161 (Virtual Threads) - 8289091: move oop safety check from SharedRuntime::get_java_tid() to JavaThread::threadObj() - 8287847: Fatal Error when suspending virtual thread after it has terminated - 8067757: Incorrect HTML generation for copied javadoc with multiple @throws tags - 8289569: [test] java/lang/ProcessBuilder/Basic.java fails on Alpine/musl - 8287851: C2 crash: assert(t->meet(t0) == t) failed: Not monotonic - 8287672: jtreg test com/sun/jndi/ldap/LdapPoolTimeoutTest.java fails intermittently in nightly run - ... and 2 more: https://git.openjdk.org/jdk/compare/83a5d599...fbbc3300 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=jdk&pr=9397&range=00.0 - jdk19: https://webrevs.openjdk.org/?repo=jdk&pr=9397&range=00.1 Changes: https://git.openjdk.org/jdk/pull/9397/files Stats: 888 lines in 24 files changed: 705 ins; 64 del; 119 mod Patch: https://git.openjdk.org/jdk/pull/9397.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9397/head:pull/9397 PR: https://git.openjdk.org/jdk/pull/9397 From coleenp at openjdk.org Wed Jul 6 15:16:04 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Jul 2022 15:16:04 GMT Subject: RFR: 8278923: Document Klass::is_loader_alive Message-ID: This trivial change just adds a comment to Klass::is_loader_alive. ------------- Commit messages: - change a word - 8278923: Document Klass::is_loader_alive Changes: https://git.openjdk.org/jdk/pull/9400/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9400&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8278923 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9400.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9400/head:pull/9400 PR: https://git.openjdk.org/jdk/pull/9400 From mgronlun at openjdk.org Wed Jul 6 11:49:52 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 6 Jul 2022 11:49:52 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Tue, 5 Jul 2022 13:47:31 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Incorporate JIT compiler restart into EventSweepCodeCache So the timestamp of this "JIT restart" event minus the timestamp of the previous "CodeCacheFull" event is the duration where JIT compilation is disabled, the reason being no memory available to accommodate new code. That's a good data point. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From rehn at openjdk.org Wed Jul 6 07:49:07 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Jul 2022 07:49:07 GMT Subject: RFR: 8286957: Held monitor count [v5] In-Reply-To: References: Message-ID: > The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. > > This change makes the counting exact by pushing the counting down in the abstraction. > The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". > > An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. > > Fixed aarch64, x64, x86 and zero. > > Passes t1-8 Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into held-mon-count - Merge branch 'master' into held-mon-count - Fixed var name - Merge branch 'master' into held-mon-count - Merge branch 'master' into held-mon-count - 8286957 - PR Baseline ------------- Changes: https://git.openjdk.org/jdk/pull/8945/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=8945&range=04 Stats: 517 lines in 43 files changed: 301 ins; 143 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/8945.diff Fetch: git fetch https://git.openjdk.org/jdk pull/8945/head:pull/8945 PR: https://git.openjdk.org/jdk/pull/8945 From rehn at openjdk.org Wed Jul 6 13:43:35 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Jul 2022 13:43:35 GMT Subject: RFR: 8286957: Held monitor count [v8] In-Reply-To: References: Message-ID: > The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. > > This change makes the counting exact by pushing the counting down in the abstraction. > The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". > > An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. > > Fixed aarch64, x64, x86 and zero. > > Passes t1-8 Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Fixed strw, zero rename and made methods return 64 bit counter in all cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/8945/files - new: https://git.openjdk.org/jdk/pull/8945/files/1bfb9c7b..b69bc54e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=8945&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=8945&range=06-07 Stats: 16 lines in 3 files changed: 0 ins; 5 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/8945.diff Fetch: git fetch https://git.openjdk.org/jdk pull/8945/head:pull/8945 PR: https://git.openjdk.org/jdk/pull/8945 From egahlin at openjdk.org Wed Jul 6 12:23:39 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 6 Jul 2022 12:23:39 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Tue, 5 Jul 2022 13:47:31 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Incorporate JIT compiler restart into EventSweepCodeCache It would be good to add a sanity check. See: https://github.com/openjdk/jdk/blob/83a5d5996bca26b5f2e97b67f9bfd0a6ad110327/test/jdk/jdk/jfr/event/compiler/TestCodeSweeper.java#L181 The test in on the ProblemList.txt due to timeouts, but probably works most of the time if you run it. Also remove "JitRestart" from TestLookForUntestedEvents.java ------------- PR: https://git.openjdk.org/jdk/pull/9334 From rehn at openjdk.org Wed Jul 6 13:05:27 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Jul 2022 13:05:27 GMT Subject: RFR: 8286957: Held monitor count [v7] In-Reply-To: References: Message-ID: > The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. > > This change makes the counting exact by pushing the counting down in the abstraction. > The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". > > An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. > > Fixed aarch64, x64, x86 and zero. > > Passes t1-8 Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Fixed return value truncation and prep CA for 32 bit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/8945/files - new: https://git.openjdk.org/jdk/pull/8945/files/85853052..1bfb9c7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=8945&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=8945&range=05-06 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/8945.diff Fetch: git fetch https://git.openjdk.org/jdk pull/8945/head:pull/8945 PR: https://git.openjdk.org/jdk/pull/8945 From jiefu at openjdk.org Wed Jul 6 07:46:41 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 6 Jul 2022 07:46:41 GMT Subject: RFR: JDK-8289799: Build warning in methodData.cpp memset zero-length parameter In-Reply-To: <_UgeK05iMUjHEZccIMx_qP8a9YKZZDeYDCJNtQzUVs0=.a1e1b39c-f059-4aab-ace8-571b03d87421@github.com> References: <_UgeK05iMUjHEZccIMx_qP8a9YKZZDeYDCJNtQzUVs0=.a1e1b39c-f059-4aab-ace8-571b03d87421@github.com> Message-ID: On Wed, 6 Jul 2022 07:23:17 GMT, Thomas Stuefe wrote: > Trivial fix for a compiler warning we see in our CI on Fedora 12 with GCC 8.3: > > > void Copy::pd_zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/cpu/x86/copy_x86.hpp:59:15, > inlined from 'static void Copy::zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/share/utilities/copy.hpp:298:21, > inlined from 'void MethodData::initialize()' at Looks good to me. ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.org/jdk/pull/9390 From dlong at openjdk.org Wed Jul 6 02:01:46 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Jul 2022 02:01:46 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: <3VeghgUTahQwYWfJSEZMhXoX7rg_E7gES203lyu_HR0=.be938670-d295-463a-870f-9af8b5ff623c@github.com> On Tue, 5 Jul 2022 08:31:31 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with two additional commits since the last revision: > > - Add an "i2i" entry to enterSpecial > - Fix comment src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1050: > 1048: OopMap* map = continuation_enter_setup(masm, stack_slots); > 1049: // The frame is complete here, but we only record it for the compiled entry, so the frame would appear unsafe, > 1050: // but that's okay because at the very worst we'll miss an async sample, but we're in interp_only_mode anyeay. Suggestion: // but that's okay because at the very worst we'll miss an async sample, but we're in interp_only_mode anyway. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From aph at openjdk.org Wed Jul 6 13:53:22 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 6 Jul 2022 13:53:22 GMT Subject: Integrated: 8288992: AArch64: CMN should be handled the same way as CMP In-Reply-To: References: Message-ID: On Wed, 22 Jun 2022 17:03:42 GMT, Andrew Haley wrote: > At present, `cmp(r8, -1)` fails at compile time, but `cmn(r8, -1)` fails at runtime. We should fix cmn() to be the same as `cmp()`. > > After this change, it's much less likely that we'll be surprised by immediate overflows in `cmn()`. This pull request has now been integrated. Changeset: cc2b7927 Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/cc2b79270445ccfb2181894fed2edfd4518a2904 Stats: 9 lines in 2 files changed: 3 ins; 0 del; 6 mod 8288992: AArch64: CMN should be handled the same way as CMP Reviewed-by: adinn, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/9246 From mbaesken at openjdk.org Wed Jul 6 14:33:29 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 6 Jul 2022 14:33:29 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Wed, 6 Jul 2022 12:56:03 GMT, Markus Gr?nlund wrote: > Yes, thank you, I took a look at those fields too. Unfortunately, there is no value that bears directly on "current in-use". The reserved and committed will not be updated. If there was a value that could reflect the current usage, even if it means adding it to both EventCodeCacheFull and the JIT restart, that would be perfect. EventCodeCacheFull because in-use is x, JIT restart because in-use is now y. There are some statistics-related properties in the CodeHeap, like for example heap->unallocated_capacity(); et al. Could some of those expose this running value perhaps? Probably we would need CodeCache::unallocated_capacity() (contains a number of code heaps as far as I know) to compare freed_memory to, because from what I see we iterate the whole CodeCache in NMethodSweeper::sweep_code_cache() . But maybe someone with better knowledge about the CodeCache should comment here. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From stuefe at openjdk.org Wed Jul 6 07:29:56 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 6 Jul 2022 07:29:56 GMT Subject: RFR: JDK-8289799: Build warning in methodData.cpp memset zero-length parameter Message-ID: <_UgeK05iMUjHEZccIMx_qP8a9YKZZDeYDCJNtQzUVs0=.a1e1b39c-f059-4aab-ace8-571b03d87421@github.com> Trivial fix for a compiler warning we see in our CI on Fedora 12 with GCC 8.3: void Copy::pd_zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/cpu/x86/copy_x86.hpp:59:15, inlined from 'static void Copy::zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/share/utilities/copy.hpp:298:21, inlined from 'void MethodData::initialize()' at ------------- Commit messages: - Test for size > 0 before zeroing Changes: https://git.openjdk.org/jdk/pull/9390/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9390&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289799 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9390.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9390/head:pull/9390 PR: https://git.openjdk.org/jdk/pull/9390 From rehn at openjdk.org Wed Jul 6 07:49:09 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Jul 2022 07:49:09 GMT Subject: RFR: 8286957: Held monitor count [v4] In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 13:50:21 GMT, Erik ?sterlund wrote: > Looks good. We might however want the counter to be 64 bit so we don't have to think about overflows. I suppose nasty JNI code could lock the entire heap and then unlock it. Thank you Erik. I'll go head and change to 64-bit counter. ------------- PR: https://git.openjdk.org/jdk/pull/8945 From eosterlund at openjdk.org Wed Jul 6 11:23:35 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 6 Jul 2022 11:23:35 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 08:55:10 GMT, Ron Pressler wrote: >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1058: >> >>> 1056: >>> 1057: address mark = __ pc(); >>> 1058: __ trampoline_call1(resolve, NULL, false); >> >> I don't think it's necessary to call the resolve stub when in interpreted mode. Can't we just call the Method's c2i adapter just like the interpreter would? I guess there might be a startup issue if the adapter hasn't been generated yet. > > I couldn't find code that does that and could be easily reused. The callee belongs to the same class as the caller, so they should both be linked in this context. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From mbaesken at openjdk.org Wed Jul 6 12:39:43 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 6 Jul 2022 12:39:43 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Wed, 6 Jul 2022 11:38:48 GMT, Markus Gr?nlund wrote: > Hi again Mathias, based on your observation that this does not really align well with the threshold parameter for SweepCodeCache and is a complement event to EventCodeCacheFull, I believe your original suggestion is better (an instant event, only for "jit restart"). I was slightly tripped up with the concept of a "JIT restart", because at this point the JIT is already stopped, as implied by EventCodeCacheFull. So "JIT restart" here contextually means, "JIT start", as in "start JITting code again". Do you know if we have an event that describes the CodeCache settings in terms of memory sizes so that the "freed memory" can be interpreted relative to it? Hi Markus, at least we store a couple of addresses in the EventCodeCacheFull event. Those could potentially be used for interpretation. EventCodeCacheFull event; event.set_startAddress((u8)heap->low_boundary()); event.set_commitedTopAddress((u8)heap->high()); event.set_reservedTopAddress((u8)heap->high_boundary()); ------------- PR: https://git.openjdk.org/jdk/pull/9334 From fgao at openjdk.org Wed Jul 6 08:01:52 2022 From: fgao at openjdk.org (Fei Gao) Date: Wed, 6 Jul 2022 08:01:52 GMT Subject: RFR: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 Message-ID: Superword doesn't vectorize any nodes of non-primitive types and thus sets `allow_address` false when calling type2aelembytes() in SuperWord::data_size()[1]. Therefore, when we try to resolve the data size for a node of T_ADDRESS type, the assertion in type2aelembytes()[2] takes effect. We try to resolve the data sizes for node s and node t in the SuperWord::adjust_alignment_for_type_conversion()[3] when type conversion between different data sizes happens. The issue is, when node s is a ConvI2L node and node t is an AddP node of T_ADDRESS type, type2aelembytes() will assert. To fix it, we should filter out all non-primitive nodes, like the patch does in SuperWord::adjust_alignment_for_type_conversion(). Since it's a failure in the mid-end, all superword available platforms are affected. In my local test, this failure can be reproduced on both x86 and aarch64. With this patch, the failure can be fixed. Apart from fixing the bug, the patch also adds necessary type check and does some clean-up in SuperWord::longer_type_for_conversion() and VectorCastNode::implemented(). [1]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1417 [2]https://github.com/openjdk/jdk/blob/b96ba19807845739b36274efb168dd048db819a3/src/hotspot/share/utilities/globalDefinitions.cpp#L326 [3]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1454 ------------- Commit messages: - 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 Changes: https://git.openjdk.org/jdk/pull/9391/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9391&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288883 Stats: 116 lines in 5 files changed: 89 ins; 9 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/9391.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9391/head:pull/9391 PR: https://git.openjdk.org/jdk/pull/9391 From rehn at openjdk.org Wed Jul 6 12:53:37 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Jul 2022 12:53:37 GMT Subject: RFR: 8286957: Held monitor count [v6] In-Reply-To: References: Message-ID: > The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. > > This change makes the counting exact by pushing the counting down in the abstraction. > The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". > > An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. > > Fixed aarch64, x64, x86 and zero. > > Passes t1-8 Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Use 64 counter on 64-bit platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/8945/files - new: https://git.openjdk.org/jdk/pull/8945/files/0fee7ccd..85853052 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=8945&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=8945&range=04-05 Stats: 77 lines in 13 files changed: 24 ins; 12 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/8945.diff Fetch: git fetch https://git.openjdk.org/jdk pull/8945/head:pull/8945 PR: https://git.openjdk.org/jdk/pull/8945 From duke at openjdk.org Wed Jul 6 12:01:33 2022 From: duke at openjdk.org (Johannes Bechberger) Date: Wed, 6 Jul 2022 12:01:33 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Tue, 5 Jul 2022 13:47:31 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Incorporate JIT compiler restart into EventSweepCodeCache Marked as reviewed by parttimenerd at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.org/jdk/pull/9334 From dholmes at openjdk.org Wed Jul 6 01:27:32 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Jul 2022 01:27:32 GMT Subject: RFR: 8289780: Remove Forte::register_stub In-Reply-To: References: Message-ID: <9iGaC4aFHNx6w3kzoJNFvidssIKlNdNl5TFB1MxhoTI=.1feed3a4-92a2-4272-86b8-38c310303a3e@github.com> On Wed, 6 Jul 2022 00:38:14 GMT, Ioi Lam wrote: > I removed `Remove Forte::register_stub` since it's used only by the Solaris Forte(TM) Performance Tools collector, which is no longer supported by the JDK. > > I also fixed a couple of places where the stub name is computed unnecessarily. > > Also renamed some `#ifndef IA64` around the code that I touched. Cleanup looks good! Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9386 From dlong at openjdk.org Wed Jul 6 02:18:45 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Jul 2022 02:18:45 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 08:31:31 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with two additional commits since the last revision: > > - Add an "i2i" entry to enterSpecial > - Fix comment src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1222: > 1220: OopMapSet* oop_maps = new OopMapSet(); > 1221: int interpreted_entry_offset = -1; > 1222: int compiled_entry_offset = -1; `compiled_entry_offset` is unsed src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1535: > 1533: OopMapSet* oop_maps = new OopMapSet(); > 1534: int interpreted_entry_offset = -1; > 1535: int compiled_entry_offset = -1; `compiled_entry_offset` is unsed ------------- PR: https://git.openjdk.org/jdk19/pull/66 From duke at openjdk.org Wed Jul 6 07:34:50 2022 From: duke at openjdk.org (Tongbao Zhang) Date: Wed, 6 Jul 2022 07:34:50 GMT Subject: RFR: 8289436: Make the redefine timer statistics more accurate In-Reply-To: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> References: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> Message-ID: On Wed, 29 Jun 2022 08:30:12 GMT, Tongbao Zhang wrote: > Make the redefine timer statistics more accurate > > After some significant performance improvements of the class redefinition, like: > https://bugs.openjdk.org/browse/JDK-8139551 > https://bugs.openjdk.org/browse/JDK-8078725 > > Some time-consumption operation were moved out the "redefine_single_class" > So the time added by phase 1 and phase 2 cannot be accurately represented to the time of "vmop_doit" Thanks for the review ------------- PR: https://git.openjdk.org/jdk/pull/9322 From iklam at openjdk.org Wed Jul 6 17:50:54 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 6 Jul 2022 17:50:54 GMT Subject: RFR: 8289780: Remove Forte::register_stub [v2] In-Reply-To: References: Message-ID: > I removed `Remove Forte::register_stub` since it's used only by the Solaris Forte(TM) Performance Tools collector, which is no longer supported by the JDK. > > I also fixed a couple of places where the stub name is computed unnecessarily. > > Also renamed some `#ifndef IA64` around the code that I touched. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Do not remove Forte::register_stub as it is used on Linux as well ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9386/files - new: https://git.openjdk.org/jdk/pull/9386/files/a1523c9d..609350f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9386&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9386&range=00-01 Stats: 117 lines in 7 files changed: 111 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/9386.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9386/head:pull/9386 PR: https://git.openjdk.org/jdk/pull/9386 From iklam at openjdk.org Wed Jul 6 17:27:44 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 6 Jul 2022 17:27:44 GMT Subject: RFR: 8289164: Convert ResolutionErrorTable to use ResourceHashtable [v2] In-Reply-To: References: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> Message-ID: On Fri, 1 Jul 2022 18:48:54 GMT, Justin Gu wrote: >> Please review my change of converting the resolutionErrorTable from hashtable to resource hashtable. I tested my changes with a mach5 tier1-4 test. > > Justin Gu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8289164: Convert ResolutionErrorTable to use ResourceHashtable New version looks good to me. ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.org/jdk/pull/9337 From mcimadamore at openjdk.org Wed Jul 6 17:16:56 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 6 Jul 2022 17:16:56 GMT Subject: [jdk19] RFR: 8287809: Revisit implementation of memory session [v4] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 17:07:37 GMT, Jorn Vernee wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert implicit vs. heap session changes > > src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/macos/MacOsAArch64VaList.java line 172: > >> 170: >> 171: public Builder(MemorySession session) { >> 172: ((MemorySessionImpl)session).checkValidState(); > > Or here, if the memory session is a non-closeable view. I believe there was a wrong renaming with the IDE here, I will fix this ------------- PR: https://git.openjdk.org/jdk19/pull/22 From jvernee at openjdk.org Wed Jul 6 17:11:54 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 6 Jul 2022 17:11:54 GMT Subject: [jdk19] RFR: 8287809: Revisit implementation of memory session [v4] In-Reply-To: References: Message-ID: On Fri, 17 Jun 2022 18:39:03 GMT, Maurizio Cimadamore wrote: >> This is a JDK 19 clone of: https://github.com/openjdk/jdk/pull/9017 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Revert implicit vs. heap session changes src/java.base/share/classes/java/lang/invoke/X-VarHandleSegmentView.java.template line 131: > 129: AbstractMemorySegmentImpl bb = checkAddress(obb, base, handle.length, true); > 130: #if[floatingPoint] > 131: $rawType$ rawValue = SCOPED_MEMORY_ACCESS.get$RawType$Unaligned(bb.session(), For instance, it's not clear to me why `baseSession()` is not called here. src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/macos/MacOsAArch64VaList.java line 172: > 170: > 171: public Builder(MemorySession session) { > 172: ((MemorySessionImpl)session).checkValidState(); Or here, if the memory session is a non-closeable view. ------------- PR: https://git.openjdk.org/jdk19/pull/22 From dnsimon at openjdk.org Wed Jul 6 16:54:51 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 6 Jul 2022 16:54:51 GMT Subject: RFR: 8282420: JFR: Remove event handlers [v6] In-Reply-To: References: Message-ID: On Tue, 10 May 2022 14:42:58 GMT, Erik Gahlin wrote: >> Hi, >> >> Could I have a review of a fix that removes event handler classes for JFR. Bytecode for event instrumentation is now only added to the event class. Benefits are: >> >> - No class memory leak in the boot class loader. >> - Reduce overhead from class loading during startup, which is important with additional JDK events that are coming (VirtualThreadStart etc.) >> - One less frame to traverse when recording a Java stack trace. >> >> Future benefits are: >> >> - Simplify creating instrumentation as a build step. See https://bugs.openjdk.java.net/browse/JDK-8279354 >> - Simplify implementation of Event Metrics. See https://bugs.openjdk.java.net/browse/JDK-8224749 >> >> When the Security Manager is removed, much of the code being added for security reasons can be deleted. >> >> There are few JFR hooks when code is being linked. Plan is to also use these for other events later. >> >> Testing: tier 1-4, jdk/jdk/jfr >> >> Thanks >> Erik > > Erik Gahlin has updated the pull request incrementally with one additional commit since the last revision: > > Minor fixes src/hotspot/share/jfr/instrumentation/jfrResolution.hpp line 40: > 38: public: > 39: static void on_runtime_resolution(const CallInfo & info, TRAPS); > 40: static void on_c1_resolution(const GraphBuilder * builder, const ciKlass * holder, const ciMethod * target); Should the declarations of `on_c1_resolution` and `on_c2_resolution` be guarded with `COMPILER2_PRESENT` and `COMPILER1_PRESENT` since the method bodies are? ------------- PR: https://git.openjdk.org/jdk/pull/8383 From mcimadamore at openjdk.org Wed Jul 6 18:01:28 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 6 Jul 2022 18:01:28 GMT Subject: [jdk19] RFR: 8287809: Revisit implementation of memory session [v5] In-Reply-To: References: Message-ID: > This is a JDK 19 clone of: https://github.com/openjdk/jdk/pull/9017 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into memory_session_cleanup - Fix ambiguity between session vs. base session - Revert implicit vs. heap session changes - Unify heap vs. implicit scopes - Merge branch 'master' into memory_session_cleanup - Fix issue in Direct-X-Buffer template - Simplify and drop the state class - Add missing files - Initial push ------------- Changes: https://git.openjdk.org/jdk19/pull/22/files Webrev: https://webrevs.openjdk.org/?repo=jdk19&pr=22&range=04 Stats: 484 lines in 24 files changed: 56 ins; 107 del; 321 mod Patch: https://git.openjdk.org/jdk19/pull/22.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/22/head:pull/22 PR: https://git.openjdk.org/jdk19/pull/22 From coleenp at openjdk.org Wed Jul 6 17:04:40 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Jul 2022 17:04:40 GMT Subject: RFR: 8289780: Remove Forte::register_stub In-Reply-To: References: Message-ID: <7E0l2wIjQUx_Xd45WIODZS-Gt-lSlhmeqYwuK3rs4aE=.9fb4a2b7-d23b-4b5a-ab01-f248f520cf1d@github.com> On Wed, 6 Jul 2022 00:38:14 GMT, Ioi Lam wrote: > I removed `Remove Forte::register_stub` since it's used only by the Solaris Forte(TM) Performance Tools collector, which is no longer supported by the JDK. > > I also fixed a couple of places where the stub name is computed unnecessarily. > > Also renamed some `#ifndef IA64` around the code that I touched. This looks good. Leaving the name forte.cpp seems odd, even though some still refer to it that way. My vote would have been for asyncGetCallTrace.cpp ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/9386 From jvernee at openjdk.org Wed Jul 6 17:11:55 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 6 Jul 2022 17:11:55 GMT Subject: [jdk19] RFR: 8287809: Revisit implementation of memory session [v4] In-Reply-To: References: Message-ID: <37B_wovfu8uoaM0hq4X1PG3xccVavpwYiZcJ9uNpqTM=.e417015c-9cef-4125-88cb-346b0a80471e@github.com> On Wed, 6 Jul 2022 17:05:51 GMT, Jorn Vernee wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert implicit vs. heap session changes > > src/java.base/share/classes/java/lang/invoke/X-VarHandleSegmentView.java.template line 131: > >> 129: AbstractMemorySegmentImpl bb = checkAddress(obb, base, handle.length, true); >> 130: #if[floatingPoint] >> 131: $rawType$ rawValue = SCOPED_MEMORY_ACCESS.get$RawType$Unaligned(bb.session(), > > For instance, it's not clear to me why `baseSession()` is not called here. It seems that, if `get$RawType$Unaligned` calls `checkValidStateRaw()` this will end up throwing an exception ------------- PR: https://git.openjdk.org/jdk19/pull/22 From mgronlun at openjdk.org Wed Jul 6 11:28:32 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 6 Jul 2022 11:28:32 GMT Subject: RFR: 8289745: JfrStructCopyFailed uses heap words instead of bytes for object sizes In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 14:08:24 GMT, Ralf Schmelter wrote: > The values for smallestSize, firstSize and totalSize in the CopyFailed type are set as the number of heap words, but should be number of bytes. This leads to wrong values in the PromotionFailed and EvacuationFailed JFR events containing this type. Marked as reviewed by mgronlun (Reviewer). Great Ralf, thank you for finding and fixing this. ------------- PR: https://git.openjdk.org/jdk/pull/9378 From stuefe at openjdk.org Wed Jul 6 16:18:42 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 6 Jul 2022 16:18:42 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 02:17:39 GMT, Jie Fu wrote: > Hi all, > > ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. > > For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. > And `line` will be allocated by `getline` @line84. > > > 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { > 54 char* line_mountpoint = NULL; > 55 char* line_filesystem = NULL; > 56 > 57 // Parse line and return a newly allocated string containing the mount point if > 58 // the line contains a matching filesystem and the mount point is accessible by > 59 // the current user. > 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || > 61 strcmp(line_filesystem, filesystem) != 0 || > 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { > 63 // Not a matching or accessible filesystem > 64 os::free(line_mountpoint); > 65 line_mountpoint = NULL; > 66 } > 67 > 68 os::free(line_filesystem); > 69 > 70 return line_mountpoint; > 71 } > 72 > 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { > 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); > 75 if (fd == NULL) { > 76 ZErrno err; > 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); > 78 return; > 79 } > 80 > 81 char* line = NULL; > 82 size_t length = 0; > 83 > 84 while (getline(&line, &length, fd) != -1) { > 85 char* const mountpoint = get_mountpoint(line, filesystem); > 86 if (mountpoint != NULL) { > 87 mountpoints->append(mountpoint); > 88 } > 89 } > 90 > 91 os::free(line); > 92 fclose(fd); > 93 } > > > See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 > > That means we have raw `::malloc() -> os::free()`, which is unbalanced. > Raw `::malloc()` does not write the header `os::free()` expects. > If NMT is on, we assert now, because NMT does not find its header in os::free(). > > > The fix just reverts `os::free()` to `::free()`. > > Testing: > - hotspot/jtreg/gc/z on Linux/x64, all passed > > Thanks. > Best regards, > Jie Hi @DamonFool , thanks for the quick fix. Embarrassing, I should have catched this. Can you please add `#include "utilities/globalDefinitions.hpp"` for the ALLOW_... macros? Otherwise, apart from the comment changes below, fine. Cheers, Thomas src/hotspot/os/linux/gc/z/zMountPoint_linux.cpp line 64: > 62: access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { > 63: // Not a matching or accessible filesystem > 64: ALLOW_C_FUNCTION(::free, ::free(line_mountpoint);) // *not* os::free Can you please extend the comment like this: "sscanf, using %m, will return malloce'd memory. Needs raw ::free, not os::free." src/hotspot/os/linux/gc/z/zMountPoint_linux.cpp line 91: > 89: } > 90: > 91: ALLOW_C_FUNCTION(::free, ::free(line);) // *not* os::free Please extend comment: "readline will return malloc'd memory. Needs raw ::free, not os::free." ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/9387 From tschatzl at openjdk.org Wed Jul 6 09:43:23 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 6 Jul 2022 09:43:23 GMT Subject: Integrated: 8289739: Add G1 specific GC breakpoints for testing In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 11:35:19 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that adds a few G1 specific GC breakpoints for future testing [JDK-8289740](https://bugs.openjdk.org/browse/JDK-8289740). > > Testing: gha, local testing > > Thanks, > Thomas This pull request has now been integrated. Changeset: 83418952 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/834189527e16d6fc3aedb97108b0f74c391dbc3b Stats: 23 lines in 3 files changed: 23 ins; 0 del; 0 mod 8289739: Add G1 specific GC breakpoints for testing Reviewed-by: kbarrett, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/9376 From stuefe at openjdk.org Wed Jul 6 16:22:47 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 6 Jul 2022 16:22:47 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 02:17:39 GMT, Jie Fu wrote: > Hi all, > > ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. > > For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. > And `line` will be allocated by `getline` @line84. > > > 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { > 54 char* line_mountpoint = NULL; > 55 char* line_filesystem = NULL; > 56 > 57 // Parse line and return a newly allocated string containing the mount point if > 58 // the line contains a matching filesystem and the mount point is accessible by > 59 // the current user. > 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || > 61 strcmp(line_filesystem, filesystem) != 0 || > 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { > 63 // Not a matching or accessible filesystem > 64 os::free(line_mountpoint); > 65 line_mountpoint = NULL; > 66 } > 67 > 68 os::free(line_filesystem); > 69 > 70 return line_mountpoint; > 71 } > 72 > 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { > 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); > 75 if (fd == NULL) { > 76 ZErrno err; > 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); > 78 return; > 79 } > 80 > 81 char* line = NULL; > 82 size_t length = 0; > 83 > 84 while (getline(&line, &length, fd) != -1) { > 85 char* const mountpoint = get_mountpoint(line, filesystem); > 86 if (mountpoint != NULL) { > 87 mountpoints->append(mountpoint); > 88 } > 89 } > 90 > 91 os::free(line); > 92 fclose(fd); > 93 } > > > See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 > > That means we have raw `::malloc() -> os::free()`, which is unbalanced. > Raw `::malloc()` does not write the header `os::free()` expects. > If NMT is on, we assert now, because NMT does not find its header in os::free(). > > > The fix just reverts `os::free()` to `::free()`. > > Testing: > - hotspot/jtreg/gc/z on Linux/x64, all passed > > Thanks. > Best regards, > Jie I'm curious, which tests did show the bug? Because I ran GHAs and a selection of our nightlies. ------------- PR: https://git.openjdk.org/jdk/pull/9387 From rehn at openjdk.org Wed Jul 6 13:43:38 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Jul 2022 13:43:38 GMT Subject: RFR: 8286957: Held monitor count [v7] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:05:27 GMT, Robbin Ehn wrote: >> The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. >> >> This change makes the counting exact by pushing the counting down in the abstraction. >> The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". >> >> An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. >> >> Fixed aarch64, x64, x86 and zero. >> >> Passes t1-8 > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed return value truncation and prep CA for 32 bit Starting retesting. ------------- PR: https://git.openjdk.org/jdk/pull/8945 From eosterlund at openjdk.org Wed Jul 6 11:23:33 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 6 Jul 2022 11:23:33 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v4] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 09:44:23 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with one additional commit since the last revision: > > Changes following review comments Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk19/pull/66 From dlong at openjdk.org Wed Jul 6 19:42:31 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Jul 2022 19:42:31 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: <2J4lDx3_qn257A-CyHpI-VZObigqY3rHDhAO3TLCL0Y=.e383cb33-2b1e-48ed-9e02-e43b9d3d5ed1@github.com> On Wed, 6 Jul 2022 08:56:49 GMT, Ron Pressler wrote: >> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1322: >> >>> 1320: >>> 1321: __ pop(rax); // return address >>> 1322: // Read interpreter arguments into registers (this is an ad-hoc i2c adapter) >> >> If I understand this correctly, this allows you to avoid creating an interpreted frame. Pretty clever! > > Yeah, it's a hand-rolled i2c adapter. I thought of just calling `gen_i2c_adapter` in place, but that would have required changing it. If we had two separate nmethods, we could rely on the standard i2c, but having two nmethods for a single Method didn't seem safe at this time. We can revisit later. I don't think the interpreter version needs to be an nmethod. It could be more like the intrinsics generated by TemplateInterpreterGenerator::generate_method_entry, but let's revisit after jdk19. ------------- PR: https://git.openjdk.org/jdk19/pull/66 From dnsimon at openjdk.org Wed Jul 6 20:28:45 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 6 Jul 2022 20:28:45 GMT Subject: RFR: 8282420: JFR: Remove event handlers [v6] In-Reply-To: <_nkpPJNh2EvuBJc_uD38QQfTwsTAdm37JJMxFEcGbbE=.2876eb7f-79f1-4b75-9208-f8d9f82c86c7@github.com> References: <_nkpPJNh2EvuBJc_uD38QQfTwsTAdm37JJMxFEcGbbE=.2876eb7f-79f1-4b75-9208-f8d9f82c86c7@github.com> Message-ID: On Wed, 6 Jul 2022 19:49:58 GMT, Erik Gahlin wrote: >> src/hotspot/share/jfr/instrumentation/jfrResolution.hpp line 40: >> >>> 38: public: >>> 39: static void on_runtime_resolution(const CallInfo & info, TRAPS); >>> 40: static void on_c1_resolution(const GraphBuilder * builder, const ciKlass * holder, const ciMethod * target); >> >> Should the declarations of `on_c1_resolution` and `on_c2_resolution` be guarded with `COMPILER2_PRESENT` and `COMPILER1_PRESENT` since the method bodies are? > > Yes. It's been fixed. See: > https://github.com/openjdk/jdk/pull/8680 Yes, I see the bodies are guarded. I was referring to the declarations here in `jfrResolution.hpp`. Shouldn't it be something like: #ifdef COMPILER1 static void on_c1_resolution(const GraphBuilder * builder, const ciKlass * holder, const ciMethod * target); #endif #ifdef COMPILER2 static void on_c2_resolution(const Parse * parse, const ciKlass * holder, const ciMethod * target); #endif and likewise in `jfr.hpp`: #ifdef COMPILER2 static void on_resolution(const Parse* parse, const ciKlass* holder, const ciMethod* target); #endif #ifdef COMPILER1 static void on_resolution(const GraphBuilder* builder, const ciKlass* holder, const ciMethod* target); #endif ------------- PR: https://git.openjdk.org/jdk/pull/8383 From coleenp at openjdk.org Wed Jul 6 19:12:47 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Jul 2022 19:12:47 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 17:50:54 GMT, Ioi Lam wrote: >> `Forte::register_stub()` should be called only when the JVm is being instrumented by Forte (aka "Oracle Developer Studio") >> >> https://www.oracle.com/tools/developerstudio/downloads/developer-studio-jsp.html >> >> We currently always format the name of generated stubs and call `Forte::register_stub()`, which usually does nothing. >> >> Example: >> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2686-L2697 >> >> To improve start-up, we should check if Forte is enabled before formatting the name. >> >> I also renamed some `#ifndef IA64` around the code that I touched. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Do not remove Forte::register_stub as it is used on Linux as well src/hotspot/share/prims/forte.hpp line 32: > 30: class Forte : AllStatic { > 31: public: > 32: static bool is_enabled() NOT_JVMTI_RETURN_(false); I don't think the rest of this forte code is disabled by JVMTI. ------------- PR: https://git.openjdk.org/jdk/pull/9386 From rehn at openjdk.org Thu Jul 7 07:17:40 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 7 Jul 2022 07:17:40 GMT Subject: RFR: 8286957: Held monitor count [v8] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:43:35 GMT, Robbin Ehn wrote: >> The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. >> >> This change makes the counting exact by pushing the counting down in the abstraction. >> The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". >> >> An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. >> >> Fixed aarch64, x64, x86 and zero. >> >> Passes t1-8 > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed strw, zero rename and made methods return 64 bit counter in all cases Incremental changes passed t1-7. @pron and @fisk can you please have a look if the incremental changes are okay? And additionally can @pron please take careful look at continuationEntry.hpp and continuationFreezeThaw.cpp Thanks ------------- PR: https://git.openjdk.org/jdk/pull/8945 From bulasevich at openjdk.org Thu Jul 7 10:31:12 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 7 Jul 2022 10:31:12 GMT Subject: RFR: 8288477: nmethod header size reduction Message-ID: Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. Cleanup work: apply CompLevel type where applicable. The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime Renaissance benchmarks shows no performance regressions on x86 and aarch. BEFORE: (gdb) ptype /o CodeBlob /* offset | size */ type = class CodeBlob { /* 8 | 4 */ const CompilerType _type; <<<< /* 12 | 4 */ int _size; /* 16 | 4 */ int _header_size; /* 20 | 4 */ int _frame_complete_offset; /* 24 | 4 */ int _data_offset; /* 28 | 4 */ int _frame_size; /* 32 | 8 */ address _code_begin; /* 40 | 8 */ address _code_end; /* 48 | 8 */ address _content_begin; /* 56 | 8 */ address _data_end; /* 64 | 8 */ address _relocation_begin; /* 72 | 8 */ address _relocation_end; /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; /* 88 | 1 */ bool _caller_must_gc_arguments; /* 89 | 1 */ bool _is_compiled; /* XXX 6-byte hole */ /* 96 | 8 */ const char *_name; /* 104 | 8 */ class AsmRemarks { /* 104 | 8 */ AsmRemarkCollection *_remarks; } _asm_remarks; /* 112 | 8 */ class DbgStrings { /* 112 | 8 */ DbgStringCollection *_strings; } _dbg_strings; /* total size (bytes): 120 */ } AFTER: (gdb) ptype /o CodeBlob /* offset | size */ type = class CodeBlob { protected: /* 8 | 8 */ address _code_begin; /* 16 | 8 */ address _code_end; /* 24 | 8 */ address _content_begin; /* 32 | 8 */ address _data_end; /* 40 | 8 */ address _relocation_begin; /* 48 | 8 */ address _relocation_end; /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; /* 64 | 8 */ const char *_name; /* 72 | 4 */ int _size; /* 76 | 4 */ int _header_size; /* 80 | 4 */ int _frame_complete_offset; /* 84 | 4 */ int _data_offset; /* 88 | 4 */ int _frame_size; /* 92 | 1 */ bool _caller_must_gc_arguments; /* 93 | 1 */ bool _is_compiled; /* 94 | 1 */ const CompilerType _type; <<<< /* XXX 1-byte hole */ /* 96 | 8 */ class AsmRemarks { /* 96 | 8 */ AsmRemarkCollection *_remarks; } _asm_remarks; /* 104 | 8 */ class DbgStrings { /* 104 | 8 */ DbgStringCollection *_strings; } _dbg_strings; /* total size (bytes): 112 */ } BEFORE: (gdb) ptype /o nmethod /* offset | size */ type = class nmethod : public CompiledMethod { private: /* 208 | 4 */ int _entry_bci; /* XXX 4-byte hole */ /* 216 | 8 */ uint64_t _gc_epoch; /* 224 | 8 */ nmethod *_osr_link; /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; /* 240 | 8 */ address _entry_point; /* 248 | 8 */ address _verified_entry_point; /* 256 | 8 */ address _osr_entry_point; /* 264 | 4 */ int _exception_offset; /* 268 | 4 */ int _unwind_handler_offset; /* 272 | 4 */ int _consts_offset; /* 276 | 4 */ int _stub_offset; /* 280 | 4 */ int _oops_offset; /* 284 | 4 */ int _metadata_offset; /* 288 | 4 */ int _scopes_data_offset; /* 292 | 4 */ int _scopes_pcs_offset; /* 296 | 4 */ int _dependencies_offset; /* 300 | 4 */ int _handler_table_offset; /* 304 | 4 */ int _nul_chk_table_offset; /* 308 | 4 */ int _speculations_offset; /* 312 | 4 */ int _jvmci_data_offset; /* 316 | 4 */ int _nmethod_end_offset; /* 320 | 4 */ int _orig_pc_offset; /* 324 | 4 */ int _compile_id; /* 328 | 4 */ int _comp_level; <<<< /* 332 | 1 */ bool _has_flushed_dependencies; /* 333 | 1 */ bool _unload_reported; /* 334 | 1 */ bool _load_reported; /* 335 | 1 */ volatile signed char _state; /* 336 | 1 */ bool _oops_are_stale; /* XXX 3-byte hole */ /* 340 | 4 */ RTMState _rtm_state; /* 344 | 4 */ volatile jint _lock_count; /* XXX 4-byte hole */ /* 352 | 8 */ volatile int64_t _stack_traversal_mark; /* 360 | 4 */ int _hotness_counter; /* 364 | 1 */ volatile uint8_t _is_unloading_state; /* XXX 3-byte hole */ /* 368 | 4 */ ByteSize _native_receiver_sp_offset; /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; /* total size (bytes): 376 */ } AFTER: (gdb) ptype /o nmethod /* offset | size */ type = class nmethod : public CompiledMethod { /* 200 | 8 */ uint64_t _gc_epoch; /* 208 | 8 */ volatile int64_t _stack_traversal_mark; /* 216 | 8 */ nmethod *_osr_link; /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; /* 232 | 8 */ address _entry_point; /* 240 | 8 */ address _verified_entry_point; /* 248 | 8 */ address _osr_entry_point; /* 256 | 4 */ int _entry_bci; /* 260 | 4 */ int _exception_offset; /* 264 | 4 */ int _unwind_handler_offset; /* 268 | 4 */ int _consts_offset; /* 272 | 4 */ int _stub_offset; /* 276 | 4 */ int _oops_offset; /* 280 | 4 */ int _metadata_offset; /* 284 | 4 */ int _scopes_data_offset; /* 288 | 4 */ int _scopes_pcs_offset; /* 292 | 4 */ int _dependencies_offset; /* 296 | 4 */ int _handler_table_offset; /* 300 | 4 */ int _nul_chk_table_offset; /* 304 | 4 */ int _speculations_offset; /* 308 | 4 */ int _jvmci_data_offset; /* 312 | 4 */ int _nmethod_end_offset; /* 316 | 4 */ int _orig_pc_offset; /* 320 | 4 */ int _compile_id; /* 324 | 4 */ RTMState _rtm_state; /* 328 | 4 */ volatile jint _lock_count; /* 332 | 4 */ int _hotness_counter; /* 336 | 4 */ ByteSize _native_receiver_sp_offset; /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; /* 344 | 1 */ CompLevel _comp_level; <<<< /* 345 | 1 */ volatile uint8_t _is_unloading_state; /* 346 | 1 */ bool _has_flushed_dependencies; /* 347 | 1 */ bool _unload_reported; /* 348 | 1 */ bool _load_reported; /* 349 | 1 */ volatile signed char _state; /* 350 | 1 */ bool _oops_are_stale; /* total size (bytes): 352 */ } ------------- Commit messages: - add a comment - - reorder fields in CodeBlob and nmethod stucts to avoid internal alignment Changes: https://git.openjdk.org/jdk/pull/9165/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9165&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288477 Stats: 200 lines in 23 files changed: 61 ins; 55 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/9165.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9165/head:pull/9165 PR: https://git.openjdk.org/jdk/pull/9165 From mcimadamore at openjdk.org Wed Jul 6 21:50:36 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 6 Jul 2022 21:50:36 GMT Subject: [jdk19] RFR: 8287809: Revisit implementation of memory session [v6] In-Reply-To: References: Message-ID: > This is a JDK 19 clone of: https://github.com/openjdk/jdk/pull/9017 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Turn non-closeable view back into MemorySession impl ------------- Changes: - all: https://git.openjdk.org/jdk19/pull/22/files - new: https://git.openjdk.org/jdk19/pull/22/files/09bb7cf3..809a0a2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk19&pr=22&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk19&pr=22&range=04-05 Stats: 79 lines in 14 files changed: 7 ins; 11 del; 61 mod Patch: https://git.openjdk.org/jdk19/pull/22.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/22/head:pull/22 PR: https://git.openjdk.org/jdk19/pull/22 From jwilhelm at openjdk.org Wed Jul 6 21:03:47 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Wed, 6 Jul 2022 21:03:47 GMT Subject: Integrated: Merge jdk19 In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:01:02 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 19 -> JDK 20 This pull request has now been integrated. Changeset: 2a6ec88c Author: Jesper Wilhelmsson URL: https://git.openjdk.org/jdk/commit/2a6ec88cd09adec43df3da1b22653271517b14a8 Stats: 888 lines in 24 files changed: 705 ins; 64 del; 119 mod Merge ------------- PR: https://git.openjdk.org/jdk/pull/9397 From rpressler at openjdk.org Wed Jul 6 20:56:00 2022 From: rpressler at openjdk.org (Ron Pressler) Date: Wed, 6 Jul 2022 20:56:00 GMT Subject: [jdk19] Integrated: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing In-Reply-To: References: Message-ID: On Fri, 24 Jun 2022 09:23:26 GMT, Ron Pressler wrote: > Please review the following bug fix: > > `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. > > Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. > > This change does three things: > > 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. > 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. > 3. In interp_only_mode, the c2i stub will not patch the callsite. > > This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 > > > Passes tiers 1-4 and Loom tiers 1-5. This pull request has now been integrated. Changeset: 9a0fa824 Author: Ron Pressler URL: https://git.openjdk.org/jdk19/commit/9a0fa8242461afe9ee4bcf80523af13500c9c1f2 Stats: 218 lines in 10 files changed: 189 ins; 10 del; 19 mod 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing Reviewed-by: dlong, eosterlund, rehn ------------- PR: https://git.openjdk.org/jdk19/pull/66 From duke at openjdk.org Thu Jul 7 10:31:12 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 7 Jul 2022 10:31:12 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: On Wed, 15 Jun 2022 09:30:59 GMT, Boris Ulasevich wrote: > Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. > > Cleanup work: apply CompLevel type where applicable. > > The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime > > Renaissance benchmarks shows no performance regressions on x86 and aarch. > > BEFORE: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > /* 8 | 4 */ const CompilerType _type; <<<< > /* 12 | 4 */ int _size; > /* 16 | 4 */ int _header_size; > /* 20 | 4 */ int _frame_complete_offset; > /* 24 | 4 */ int _data_offset; > /* 28 | 4 */ int _frame_size; > /* 32 | 8 */ address _code_begin; > /* 40 | 8 */ address _code_end; > /* 48 | 8 */ address _content_begin; > /* 56 | 8 */ address _data_end; > /* 64 | 8 */ address _relocation_begin; > /* 72 | 8 */ address _relocation_end; > /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 88 | 1 */ bool _caller_must_gc_arguments; > /* 89 | 1 */ bool _is_compiled; > /* XXX 6-byte hole */ > /* 96 | 8 */ const char *_name; > /* 104 | 8 */ class AsmRemarks { > /* 104 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 112 | 8 */ class DbgStrings { > /* 112 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 120 */ > } > > AFTER: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > protected: > /* 8 | 8 */ address _code_begin; > /* 16 | 8 */ address _code_end; > /* 24 | 8 */ address _content_begin; > /* 32 | 8 */ address _data_end; > /* 40 | 8 */ address _relocation_begin; > /* 48 | 8 */ address _relocation_end; > /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 64 | 8 */ const char *_name; > /* 72 | 4 */ int _size; > /* 76 | 4 */ int _header_size; > /* 80 | 4 */ int _frame_complete_offset; > /* 84 | 4 */ int _data_offset; > /* 88 | 4 */ int _frame_size; > /* 92 | 1 */ bool _caller_must_gc_arguments; > /* 93 | 1 */ bool _is_compiled; > /* 94 | 1 */ const CompilerType _type; <<<< > /* XXX 1-byte hole */ > /* 96 | 8 */ class AsmRemarks { > /* 96 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 104 | 8 */ class DbgStrings { > /* 104 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 112 */ > } > > BEFORE: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > private: > /* 208 | 4 */ int _entry_bci; > /* XXX 4-byte hole */ > /* 216 | 8 */ uint64_t _gc_epoch; > /* 224 | 8 */ nmethod *_osr_link; > /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 240 | 8 */ address _entry_point; > /* 248 | 8 */ address _verified_entry_point; > /* 256 | 8 */ address _osr_entry_point; > /* 264 | 4 */ int _exception_offset; > /* 268 | 4 */ int _unwind_handler_offset; > /* 272 | 4 */ int _consts_offset; > /* 276 | 4 */ int _stub_offset; > /* 280 | 4 */ int _oops_offset; > /* 284 | 4 */ int _metadata_offset; > /* 288 | 4 */ int _scopes_data_offset; > /* 292 | 4 */ int _scopes_pcs_offset; > /* 296 | 4 */ int _dependencies_offset; > /* 300 | 4 */ int _handler_table_offset; > /* 304 | 4 */ int _nul_chk_table_offset; > /* 308 | 4 */ int _speculations_offset; > /* 312 | 4 */ int _jvmci_data_offset; > /* 316 | 4 */ int _nmethod_end_offset; > /* 320 | 4 */ int _orig_pc_offset; > /* 324 | 4 */ int _compile_id; > /* 328 | 4 */ int _comp_level; <<<< > /* 332 | 1 */ bool _has_flushed_dependencies; > /* 333 | 1 */ bool _unload_reported; > /* 334 | 1 */ bool _load_reported; > /* 335 | 1 */ volatile signed char _state; > /* 336 | 1 */ bool _oops_are_stale; > /* XXX 3-byte hole */ > /* 340 | 4 */ RTMState _rtm_state; > /* 344 | 4 */ volatile jint _lock_count; > /* XXX 4-byte hole */ > /* 352 | 8 */ volatile int64_t _stack_traversal_mark; > /* 360 | 4 */ int _hotness_counter; > /* 364 | 1 */ volatile uint8_t _is_unloading_state; > /* XXX 3-byte hole */ > /* 368 | 4 */ ByteSize _native_receiver_sp_offset; > /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; > > /* total size (bytes): 376 */ > } > > AFTER: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; > /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 232 | 8 */ address _entry_point; > /* 240 | 8 */ address _verified_entry_point; > /* 248 | 8 */ address _osr_entry_point; > /* 256 | 4 */ int _entry_bci; > /* 260 | 4 */ int _exception_offset; > /* 264 | 4 */ int _unwind_handler_offset; > /* 268 | 4 */ int _consts_offset; > /* 272 | 4 */ int _stub_offset; > /* 276 | 4 */ int _oops_offset; > /* 280 | 4 */ int _metadata_offset; > /* 284 | 4 */ int _scopes_data_offset; > /* 288 | 4 */ int _scopes_pcs_offset; > /* 292 | 4 */ int _dependencies_offset; > /* 296 | 4 */ int _handler_table_offset; > /* 300 | 4 */ int _nul_chk_table_offset; > /* 304 | 4 */ int _speculations_offset; > /* 308 | 4 */ int _jvmci_data_offset; > /* 312 | 4 */ int _nmethod_end_offset; > /* 316 | 4 */ int _orig_pc_offset; > /* 320 | 4 */ int _compile_id; > /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; > /* 332 | 4 */ int _hotness_counter; > /* 336 | 4 */ ByteSize _native_receiver_sp_offset; > /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; > /* 344 | 1 */ CompLevel _comp_level; <<<< > /* 345 | 1 */ volatile uint8_t _is_unloading_state; > /* 346 | 1 */ bool _has_flushed_dependencies; > /* 347 | 1 */ bool _unload_reported; > /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; > /* 350 | 1 */ bool _oops_are_stale; > > /* total size (bytes): 352 */ > } Overall, lgtm. lgtm It would be worth to add a comment to `CodeBlob` saying that fields have special order to reduce its size. Otherwise someone might break the order in the future. src/hotspot/share/code/nmethod.cpp line 651: > 649: #endif > 650: _compile_id = compile_id; > 651: _comp_level = CompLevel_none; Why do we need to move it to line 626? src/hotspot/share/compiler/compilerDefinitions.hpp line 57: > 55: > 56: // Enumeration to distinguish tiers of compilation > 57: enum CompLevel : s1 { Hope it won't cause gcc to generate inefficient code manupulating bytes. ------------- PR: https://git.openjdk.org/jdk/pull/9165Marked as reviewed by eastig at github.com (no known OpenJDK username). From mcimadamore at openjdk.org Wed Jul 6 21:50:39 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 6 Jul 2022 21:50:39 GMT Subject: [jdk19] RFR: 8287809: Revisit implementation of memory session [v5] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 18:01:28 GMT, Maurizio Cimadamore wrote: >> This is a JDK 19 clone of: https://github.com/openjdk/jdk/pull/9017 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'master' into memory_session_cleanup > - Fix ambiguity between session vs. base session > - Revert implicit vs. heap session changes > - Unify heap vs. implicit scopes > - Merge branch 'master' into memory_session_cleanup > - Fix issue in Direct-X-Buffer template > - Simplify and drop the state class > - Add missing files > - Initial push I've fixed the issues in the review comments, and further reduced the differences between the contents of this patch and mainline (while retaining the improvements). ------------- PR: https://git.openjdk.org/jdk19/pull/22 From duke at openjdk.org Wed Jul 6 22:53:40 2022 From: duke at openjdk.org (Tongbao Zhang) Date: Wed, 6 Jul 2022 22:53:40 GMT Subject: Integrated: 8289436: Make the redefine timer statistics more accurate In-Reply-To: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> References: <0IXtkRmPFHL6LxPiSon40rtvEYfXwm1Ws4lC1WcoKIc=.0610468b-10a4-41f3-ab49-57e7890c2a4f@github.com> Message-ID: On Wed, 29 Jun 2022 08:30:12 GMT, Tongbao Zhang wrote: > Make the redefine timer statistics more accurate > > After some significant performance improvements of the class redefinition, like: > https://bugs.openjdk.org/browse/JDK-8139551 > https://bugs.openjdk.org/browse/JDK-8078725 > > Some time-consumption operation were moved out the "redefine_single_class" > So the time added by phase 1 and phase 2 cannot be accurately represented to the time of "vmop_doit" This pull request has now been integrated. Changeset: 403a9bc7 Author: Tongbao Zhang Committer: Jie Fu URL: https://git.openjdk.org/jdk/commit/403a9bc79645018ee61b47bab67fe231577dd914 Stats: 10 lines in 2 files changed: 8 ins; 1 del; 1 mod 8289436: Make the redefine timer statistics more accurate Reviewed-by: sspitsyn, cjplummer, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/9322 From jiefu at openjdk.org Thu Jul 7 01:13:51 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 7 Jul 2022 01:13:51 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 16:19:00 GMT, Thomas Stuefe wrote: > I'm curious, which tests did show the bug? Because I ran GHAs and a selection of our nightlies. This bug can be exposed by `java -XX:+UseZGC` on some of our cloud machines with os/tlinux, not all the platforms would crash (e.g., I didn't reproduce it on Ubuntu20.04). ------------- PR: https://git.openjdk.org/jdk/pull/9387 From lucy at openjdk.org Thu Jul 7 08:31:40 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 7 Jul 2022 08:31:40 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Thu, 7 Jul 2022 08:25:44 GMT, Matthias Baesken wrote: > So is the common opinion to get back to a separate JIT start event (I think that naming is prefered over JIT restart, am I correct) ? > Additionally we add CodeCache:max_capacity() at both JIT start and EventCodeCacheFull ? I would be in favour of that. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From jiefu at openjdk.org Thu Jul 7 01:08:14 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 7 Jul 2022 01:08:14 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v2] In-Reply-To: References: Message-ID: > Hi all, > > ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. > > For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. > And `line` will be allocated by `getline` @line84. > > > 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { > 54 char* line_mountpoint = NULL; > 55 char* line_filesystem = NULL; > 56 > 57 // Parse line and return a newly allocated string containing the mount point if > 58 // the line contains a matching filesystem and the mount point is accessible by > 59 // the current user. > 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || > 61 strcmp(line_filesystem, filesystem) != 0 || > 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { > 63 // Not a matching or accessible filesystem > 64 os::free(line_mountpoint); > 65 line_mountpoint = NULL; > 66 } > 67 > 68 os::free(line_filesystem); > 69 > 70 return line_mountpoint; > 71 } > 72 > 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { > 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); > 75 if (fd == NULL) { > 76 ZErrno err; > 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); > 78 return; > 79 } > 80 > 81 char* line = NULL; > 82 size_t length = 0; > 83 > 84 while (getline(&line, &length, fd) != -1) { > 85 char* const mountpoint = get_mountpoint(line, filesystem); > 86 if (mountpoint != NULL) { > 87 mountpoints->append(mountpoint); > 88 } > 89 } > 90 > 91 os::free(line); > 92 fclose(fd); > 93 } > > > See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 > > That means we have raw `::malloc() -> os::free()`, which is unbalanced. > Raw `::malloc()` does not write the header `os::free()` expects. > If NMT is on, we assert now, because NMT does not find its header in os::free(). > > > The fix just reverts `os::free()` to `::free()`. > > Testing: > - hotspot/jtreg/gc/z on Linux/x64, all passed > > Thanks. > Best regards, > Jie Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9387/files - new: https://git.openjdk.org/jdk/pull/9387/files/c487491d..8ae3bfcf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9387&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9387&range=00-01 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9387.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9387/head:pull/9387 PR: https://git.openjdk.org/jdk/pull/9387 From ngasson at openjdk.org Thu Jul 7 08:55:32 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Thu, 7 Jul 2022 08:55:32 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines in C2 In-Reply-To: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: On Thu, 7 Jul 2022 04:10:57 GMT, Yi-Fan Tsai wrote: > A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. > > Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 38: > 36: } > 37: > 38: template Why does this need to be a template? It seems to only ever be instantiated with one concrete class (the real MacroAssembler). ------------- PR: https://git.openjdk.org/jdk/pull/9405 From jiefu at openjdk.org Thu Jul 7 01:13:52 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 7 Jul 2022 01:13:52 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 01:10:38 GMT, Jie Fu wrote: >> I'm curious, which tests did show the bug? Because I ran GHAs and a selection of our nightlies. > >> I'm curious, which tests did show the bug? Because I ran GHAs and a selection of our nightlies. > > This bug can be exposed by `java -XX:+UseZGC` on some of our cloud machines with os/tlinux, not all the platforms would crash (e.g., I didn't reproduce it on Ubuntu20.04). > Hi @DamonFool , > > thanks for the quick fix. Embarrassing, I should have catched this. > > Can you please add `#include "utilities/globalDefinitions.hpp"` for the ALLOW_... macros? > > Otherwise, apart from the comment changes below, fine. > > Cheers, Thomas Thanks @tstuefe for your review. All the comments have been addressed. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9387 From dholmes at openjdk.org Thu Jul 7 08:14:28 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 7 Jul 2022 08:14:28 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v2] In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 01:08:14 GMT, Jie Fu wrote: >> Hi all, >> >> ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. >> >> For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. >> And `line` will be allocated by `getline` @line84. >> >> >> 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { >> 54 char* line_mountpoint = NULL; >> 55 char* line_filesystem = NULL; >> 56 >> 57 // Parse line and return a newly allocated string containing the mount point if >> 58 // the line contains a matching filesystem and the mount point is accessible by >> 59 // the current user. >> 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || >> 61 strcmp(line_filesystem, filesystem) != 0 || >> 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { >> 63 // Not a matching or accessible filesystem >> 64 os::free(line_mountpoint); >> 65 line_mountpoint = NULL; >> 66 } >> 67 >> 68 os::free(line_filesystem); >> 69 >> 70 return line_mountpoint; >> 71 } >> 72 >> 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { >> 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); >> 75 if (fd == NULL) { >> 76 ZErrno err; >> 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); >> 78 return; >> 79 } >> 80 >> 81 char* line = NULL; >> 82 size_t length = 0; >> 83 >> 84 while (getline(&line, &length, fd) != -1) { >> 85 char* const mountpoint = get_mountpoint(line, filesystem); >> 86 if (mountpoint != NULL) { >> 87 mountpoints->append(mountpoint); >> 88 } >> 89 } >> 90 >> 91 os::free(line); >> 92 fclose(fd); >> 93 } >> >> >> See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 >> >> That means we have raw `::malloc() -> os::free()`, which is unbalanced. >> Raw `::malloc()` does not write the header `os::free()` expects. >> If NMT is on, we assert now, because NMT does not find its header in os::free(). >> >> >> The fix just reverts `os::free()` to `::free()`. >> >> Testing: >> - hotspot/jtreg/gc/z on Linux/x64, all passed >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Looks good. One nit below re comments. Thanks. src/hotspot/os/linux/gc/z/zMountPoint_linux.cpp line 65: > 63: access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { > 64: // Not a matching or accessible filesystem > 65: // sscanf, using %m, will return malloced memory. Need raw ::free, not os::free. The comment could be lifted to before sscanf so that it covers both uses of ::free. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9387 From duke at openjdk.org Thu Jul 7 04:19:22 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Thu, 7 Jul 2022 04:19:22 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines in C2 Message-ID: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. ------------- Commit messages: - Remove dead codes - Reuse runtime call trampolines Changes: https://git.openjdk.org/jdk/pull/9405/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8280152 Stats: 205 lines in 5 files changed: 199 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/9405.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9405/head:pull/9405 PR: https://git.openjdk.org/jdk/pull/9405 From mbaesken at openjdk.org Thu Jul 7 08:31:39 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 7 Jul 2022 08:31:39 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Tue, 5 Jul 2022 13:47:31 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Incorporate JIT compiler restart into EventSweepCodeCache So is the common opinion to get back to a separate JIT start event (I think that naming is prefered over JIT restart, am I correct) ? Additionally we add CodeCache:max_capacity() at both JIT start and EventCodeCacheFull ? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From iklam at openjdk.org Wed Jul 6 17:52:04 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 6 Jul 2022 17:52:04 GMT Subject: RFR: 8289780: Remove Forte::register_stub In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 00:38:14 GMT, Ioi Lam wrote: > I removed `Remove Forte::register_stub` since it's used only by the Solaris Forte(TM) Performance Tools collector, which is no longer supported by the JDK. > > I also fixed a couple of places where the stub name is computed unnecessarily. > > Also renamed some `#ifndef IA64` around the code that I touched. It turns out that `collector_func_load` is used by OracleDeveloperStudio on Linux as well, so I can't remove the call to collector_func_load. I think this (weak) function will be non-null when OracleDeveloperStudio loads the JVM: $ nm OracleDeveloperStudio12.6-linux-x86-bin/developerstudio12.6/lib/analyzer/amd64/libcollector.so \ | grep collector_func_load 000000000002efd0 T __collector_func_load 000000000002efd0 W collector_func_load So slight change of plan. I am adding a check to avoid computing the `blob_id`, etc, when Forte::is_enabled()` is false, which will be the case in most cases. ------------- PR: https://git.openjdk.org/jdk/pull/9386 From dlong at openjdk.org Wed Jul 6 02:45:43 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Jul 2022 02:45:43 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v3] In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 08:31:31 GMT, Ron Pressler wrote: >> Please review the following bug fix: >> >> `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. >> >> Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. >> >> This change does three things: >> >> 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. >> 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. >> 3. In interp_only_mode, the c2i stub will not patch the callsite. >> >> This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 >> >> >> Passes tiers 1-4 and Loom tiers 1-5. > > Ron Pressler has updated the pull request incrementally with two additional commits since the last revision: > > - Add an "i2i" entry to enterSpecial > - Fix comment src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1322: > 1320: > 1321: __ pop(rax); // return address > 1322: // Read interpreter arguments into registers (this is an ad-hoc i2c adapter) If I understand this correctly, this allows you to avoid creating an interpreted frame. Pretty clever! ------------- PR: https://git.openjdk.org/jdk19/pull/66 From stuefe at openjdk.org Thu Jul 7 05:30:49 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Jul 2022 05:30:49 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v2] In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 01:08:14 GMT, Jie Fu wrote: >> Hi all, >> >> ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. >> >> For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. >> And `line` will be allocated by `getline` @line84. >> >> >> 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { >> 54 char* line_mountpoint = NULL; >> 55 char* line_filesystem = NULL; >> 56 >> 57 // Parse line and return a newly allocated string containing the mount point if >> 58 // the line contains a matching filesystem and the mount point is accessible by >> 59 // the current user. >> 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || >> 61 strcmp(line_filesystem, filesystem) != 0 || >> 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { >> 63 // Not a matching or accessible filesystem >> 64 os::free(line_mountpoint); >> 65 line_mountpoint = NULL; >> 66 } >> 67 >> 68 os::free(line_filesystem); >> 69 >> 70 return line_mountpoint; >> 71 } >> 72 >> 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { >> 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); >> 75 if (fd == NULL) { >> 76 ZErrno err; >> 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); >> 78 return; >> 79 } >> 80 >> 81 char* line = NULL; >> 82 size_t length = 0; >> 83 >> 84 while (getline(&line, &length, fd) != -1) { >> 85 char* const mountpoint = get_mountpoint(line, filesystem); >> 86 if (mountpoint != NULL) { >> 87 mountpoints->append(mountpoint); >> 88 } >> 89 } >> 90 >> 91 os::free(line); >> 92 fclose(fd); >> 93 } >> >> >> See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 >> >> That means we have raw `::malloc() -> os::free()`, which is unbalanced. >> Raw `::malloc()` does not write the header `os::free()` expects. >> If NMT is on, we assert now, because NMT does not find its header in os::free(). >> >> >> The fix just reverts `os::free()` to `::free()`. >> >> Testing: >> - hotspot/jtreg/gc/z on Linux/x64, all passed >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Looks good! ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/9387 From coleenp at openjdk.org Wed Jul 6 14:04:41 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Jul 2022 14:04:41 GMT Subject: RFR: 8289164: Convert ResolutionErrorTable to use ResourceHashtable [v2] In-Reply-To: References: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> Message-ID: On Fri, 1 Jul 2022 18:48:54 GMT, Justin Gu wrote: >> Please review my change of converting the resolutionErrorTable from hashtable to resource hashtable. I tested my changes with a mach5 tier1-4 test. > > Justin Gu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8289164: Convert ResolutionErrorTable to use ResourceHashtable Looks good! Nice change! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/9337 From iwalulya at openjdk.org Wed Jul 6 08:41:40 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 6 Jul 2022 08:41:40 GMT Subject: RFR: 8289739: Add G1 specific GC breakpoints for testing In-Reply-To: References: Message-ID: On Tue, 5 Jul 2022 11:35:19 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that adds a few G1 specific GC breakpoints for future testing [JDK-8289740](https://bugs.openjdk.org/browse/JDK-8289740). > > Testing: gha, local testing > > Thanks, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9376 From tschatzl at openjdk.org Wed Jul 6 09:43:22 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 6 Jul 2022 09:43:22 GMT Subject: RFR: 8289739: Add G1 specific GC breakpoints for testing In-Reply-To: References: Message-ID: <8dBJRQjuS3bGZCVJ5_NvChbZPFz5J_DXqD0qttsxpeo=.3176e953-fd96-4b3d-a517-044a272255f5@github.com> On Wed, 6 Jul 2022 08:37:53 GMT, Ivan Walulya wrote: >> Hi all, >> >> can I have reviews for this change that adds a few G1 specific GC breakpoints for future testing [JDK-8289740](https://bugs.openjdk.org/browse/JDK-8289740). >> >> Testing: gha, local testing >> >> Thanks, >> Thomas > > Marked as reviewed by iwalulya (Reviewer). Thanks @walulyai @kimbarrett for your reviews ------------- PR: https://git.openjdk.org/jdk/pull/9376 From rpressler at openjdk.org Wed Jul 6 09:44:23 2022 From: rpressler at openjdk.org (Ron Pressler) Date: Wed, 6 Jul 2022 09:44:23 GMT Subject: [jdk19] RFR: 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing [v4] In-Reply-To: References: Message-ID: > Please review the following bug fix: > > `Continuation.enterSpecial` is a generated special nmethod (albeit not a Java method), with a well-known frame layout that calls `Continuation.enter`. > > Because it is compiled, it resolves the call to `Continuation.enter` to its compiled version, if available. But this results in the compiled `Continuation.enter` being called even when the thread is in interp_only_mode. > > This change does three things: > > 1. When entering interp_only_mode, `Continuation::set_cont_fastpath_thread_state` will clear enterSpecial's resolved callsite to Continuation.enter. > 2. In interp_only_mode, `SharedRuntime::resolve_static_call_C` will return `Continuation.enter`'s c2i entry rather than `verified_code_entry`. > 3. In interp_only_mode, the c2i stub will not patch the callsite. > > This fix isn't perfect, because a different thread, not in interp_only_mode, might patch the call. A longer-term solution is to create an "interpreted" version of `enterSpecial` and supporting an ad-hoc deoptimization. See https://bugs.openjdk.org/browse/JDK-8289128 > > > Passes tiers 1-4 and Loom tiers 1-5. Ron Pressler has updated the pull request incrementally with one additional commit since the last revision: Changes following review comments ------------- Changes: - all: https://git.openjdk.org/jdk19/pull/66/files - new: https://git.openjdk.org/jdk19/pull/66/files/7323f635..43f18e73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk19&pr=66&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk19&pr=66&range=02-03 Stats: 5 lines in 2 files changed: 2 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk19/pull/66.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/66/head:pull/66 PR: https://git.openjdk.org/jdk19/pull/66 From bulasevich at openjdk.org Thu Jul 7 10:31:13 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 7 Jul 2022 10:31:13 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: On Mon, 20 Jun 2022 20:21:58 GMT, Evgeny Astigeevich wrote: >> Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. >> >> Cleanup work: apply CompLevel type where applicable. >> >> The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime >> >> Renaissance benchmarks shows no performance regressions on x86 and aarch. >> >> BEFORE: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> /* 8 | 4 */ const CompilerType _type; <<<< >> /* 12 | 4 */ int _size; >> /* 16 | 4 */ int _header_size; >> /* 20 | 4 */ int _frame_complete_offset; >> /* 24 | 4 */ int _data_offset; >> /* 28 | 4 */ int _frame_size; >> /* 32 | 8 */ address _code_begin; >> /* 40 | 8 */ address _code_end; >> /* 48 | 8 */ address _content_begin; >> /* 56 | 8 */ address _data_end; >> /* 64 | 8 */ address _relocation_begin; >> /* 72 | 8 */ address _relocation_end; >> /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 88 | 1 */ bool _caller_must_gc_arguments; >> /* 89 | 1 */ bool _is_compiled; >> /* XXX 6-byte hole */ >> /* 96 | 8 */ const char *_name; >> /* 104 | 8 */ class AsmRemarks { >> /* 104 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 112 | 8 */ class DbgStrings { >> /* 112 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 120 */ >> } >> >> AFTER: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> protected: >> /* 8 | 8 */ address _code_begin; >> /* 16 | 8 */ address _code_end; >> /* 24 | 8 */ address _content_begin; >> /* 32 | 8 */ address _data_end; >> /* 40 | 8 */ address _relocation_begin; >> /* 48 | 8 */ address _relocation_end; >> /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 64 | 8 */ const char *_name; >> /* 72 | 4 */ int _size; >> /* 76 | 4 */ int _header_size; >> /* 80 | 4 */ int _frame_complete_offset; >> /* 84 | 4 */ int _data_offset; >> /* 88 | 4 */ int _frame_size; >> /* 92 | 1 */ bool _caller_must_gc_arguments; >> /* 93 | 1 */ bool _is_compiled; >> /* 94 | 1 */ const CompilerType _type; <<<< >> /* XXX 1-byte hole */ >> /* 96 | 8 */ class AsmRemarks { >> /* 96 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 104 | 8 */ class DbgStrings { >> /* 104 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 112 */ >> } >> >> BEFORE: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> private: >> /* 208 | 4 */ int _entry_bci; >> /* XXX 4-byte hole */ >> /* 216 | 8 */ uint64_t _gc_epoch; >> /* 224 | 8 */ nmethod *_osr_link; >> /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 240 | 8 */ address _entry_point; >> /* 248 | 8 */ address _verified_entry_point; >> /* 256 | 8 */ address _osr_entry_point; >> /* 264 | 4 */ int _exception_offset; >> /* 268 | 4 */ int _unwind_handler_offset; >> /* 272 | 4 */ int _consts_offset; >> /* 276 | 4 */ int _stub_offset; >> /* 280 | 4 */ int _oops_offset; >> /* 284 | 4 */ int _metadata_offset; >> /* 288 | 4 */ int _scopes_data_offset; >> /* 292 | 4 */ int _scopes_pcs_offset; >> /* 296 | 4 */ int _dependencies_offset; >> /* 300 | 4 */ int _handler_table_offset; >> /* 304 | 4 */ int _nul_chk_table_offset; >> /* 308 | 4 */ int _speculations_offset; >> /* 312 | 4 */ int _jvmci_data_offset; >> /* 316 | 4 */ int _nmethod_end_offset; >> /* 320 | 4 */ int _orig_pc_offset; >> /* 324 | 4 */ int _compile_id; >> /* 328 | 4 */ int _comp_level; <<<< >> /* 332 | 1 */ bool _has_flushed_dependencies; >> /* 333 | 1 */ bool _unload_reported; >> /* 334 | 1 */ bool _load_reported; >> /* 335 | 1 */ volatile signed char _state; >> /* 336 | 1 */ bool _oops_are_stale; >> /* XXX 3-byte hole */ >> /* 340 | 4 */ RTMState _rtm_state; >> /* 344 | 4 */ volatile jint _lock_count; >> /* XXX 4-byte hole */ >> /* 352 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 360 | 4 */ int _hotness_counter; >> /* 364 | 1 */ volatile uint8_t _is_unloading_state; >> /* XXX 3-byte hole */ >> /* 368 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; >> >> /* total size (bytes): 376 */ >> } >> >> AFTER: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> /* 200 | 8 */ uint64_t _gc_epoch; >> /* 208 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 216 | 8 */ nmethod *_osr_link; >> /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 232 | 8 */ address _entry_point; >> /* 240 | 8 */ address _verified_entry_point; >> /* 248 | 8 */ address _osr_entry_point; >> /* 256 | 4 */ int _entry_bci; >> /* 260 | 4 */ int _exception_offset; >> /* 264 | 4 */ int _unwind_handler_offset; >> /* 268 | 4 */ int _consts_offset; >> /* 272 | 4 */ int _stub_offset; >> /* 276 | 4 */ int _oops_offset; >> /* 280 | 4 */ int _metadata_offset; >> /* 284 | 4 */ int _scopes_data_offset; >> /* 288 | 4 */ int _scopes_pcs_offset; >> /* 292 | 4 */ int _dependencies_offset; >> /* 296 | 4 */ int _handler_table_offset; >> /* 300 | 4 */ int _nul_chk_table_offset; >> /* 304 | 4 */ int _speculations_offset; >> /* 308 | 4 */ int _jvmci_data_offset; >> /* 312 | 4 */ int _nmethod_end_offset; >> /* 316 | 4 */ int _orig_pc_offset; >> /* 320 | 4 */ int _compile_id; >> /* 324 | 4 */ RTMState _rtm_state; >> /* 328 | 4 */ volatile jint _lock_count; >> /* 332 | 4 */ int _hotness_counter; >> /* 336 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; >> /* 344 | 1 */ CompLevel _comp_level; <<<< >> /* 345 | 1 */ volatile uint8_t _is_unloading_state; >> /* 346 | 1 */ bool _has_flushed_dependencies; >> /* 347 | 1 */ bool _unload_reported; >> /* 348 | 1 */ bool _load_reported; >> /* 349 | 1 */ volatile signed char _state; >> /* 350 | 1 */ bool _oops_are_stale; >> >> /* total size (bytes): 352 */ >> } > > src/hotspot/share/code/nmethod.cpp line 651: > >> 649: #endif >> 650: _compile_id = compile_id; >> 651: _comp_level = CompLevel_none; > > Why do we need to move it to line 626? I put it back. Thanks. > src/hotspot/share/compiler/compilerDefinitions.hpp line 57: > >> 55: >> 56: // Enumeration to distinguish tiers of compilation >> 57: enum CompLevel : s1 { > > Hope it won't cause gcc to generate inefficient code manupulating bytes. AARCH and AMD have load byte instructions (ldr:ldrb, mov:movzx), I believe method::comp_level() code takes the same number of instructions before/after the change. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From jvernee at openjdk.org Wed Jul 6 17:00:47 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 6 Jul 2022 17:00:47 GMT Subject: [jdk19] RFR: 8287809: Revisit implementation of memory session [v4] In-Reply-To: References: Message-ID: <9ZEADSzSlDzVxEDuF3KT1bgynXFLLtLmXZi1XszM0D0=.ff3a84ef-d38b-4daa-920e-acee3ee97e6c@github.com> On Fri, 17 Jun 2022 18:39:03 GMT, Maurizio Cimadamore wrote: >> This is a JDK 19 clone of: https://github.com/openjdk/jdk/pull/9017 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Revert implicit vs. heap session changes Am I understanding correctly that, before calling `checkValidState` we also need to always call `baseSession()`? `checkValidState()` just calls `checkValidStateRaw` in a try/catch, and for non-closeable views this always throws. There seem to be many cases where `baseSession()` is not called... ------------- PR: https://git.openjdk.org/jdk19/pull/22 From egahlin at openjdk.org Wed Jul 6 19:53:40 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 6 Jul 2022 19:53:40 GMT Subject: RFR: 8282420: JFR: Remove event handlers [v6] In-Reply-To: References: Message-ID: <_nkpPJNh2EvuBJc_uD38QQfTwsTAdm37JJMxFEcGbbE=.2876eb7f-79f1-4b75-9208-f8d9f82c86c7@github.com> On Wed, 6 Jul 2022 16:50:42 GMT, Doug Simon wrote: >> Erik Gahlin has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor fixes > > src/hotspot/share/jfr/instrumentation/jfrResolution.hpp line 40: > >> 38: public: >> 39: static void on_runtime_resolution(const CallInfo & info, TRAPS); >> 40: static void on_c1_resolution(const GraphBuilder * builder, const ciKlass * holder, const ciMethod * target); > > Should the declarations of `on_c1_resolution` and `on_c2_resolution` be guarded with `COMPILER2_PRESENT` and `COMPILER1_PRESENT` since the method bodies are? Yes. It's been fixed. See: https://github.com/openjdk/jdk/pull/8680 ------------- PR: https://git.openjdk.org/jdk/pull/8383 From iotsakp at gmail.com Wed Jul 6 18:19:22 2022 From: iotsakp at gmail.com (Ioannis Tsakpinis) Date: Wed, 6 Jul 2022 21:19:22 +0300 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> <21506449-753B-4483-B10C-8C5991999BD8@oracle.com> Message-ID: Hey all, Afaik, the Java->native transition is required because an upcall back to Java during the JNI downcall would otherwise crash the JVM. Are transitions required for any other reason? If not, would it be viable to move the expensive part of the transition (or even all of it) to the upcall? I.e. something like: 1. On downcall, do not transition but set a thread state flag. 2. If an upcall happens, check the flag and do the Java->native transition, before proceeding as usual. 3. When the downcall returns, do the native->Java transition only if #2 happened. (assuming #1 and the no-transition #3 happy path can be implemented cheaply, e.g. no expensive memory barriers) Note that, unlike Panama, JNI upcalls are already extremely expensive compared to downcalls and every application out there should easily cope with a bit more overhead. I would say that even in Panama it would make sense to move overhead to upcalls, if it made downcalls faster. Btw, for those wondering how an upcall-during-a-downcall could unintentionally happen in a real-world application, a good example is OpenGL debug mode. Normally, an OpenGL application may be doing hundreds, if not thousands, of JNI downcalls per frame. In a well-designed rendering engine, 99% of those would be asynchronous calls that simply pass some data to the driver and return immediately. A perfect candidate to apply JNI CriticalNatives on. However, if the application is started with a debug OpenGL context and it registers a debug message callback, then suddenly almost every OpenGL call has the potential to call back into Java to report an error. Similar functionality exists in several other APIs (such as OpenCL, OpenXR and Vulkan). - Ioannis On Wed, 6 Jul 2022 at 16:54, Erik Osterlund wrote: > > For completeness, it should at least be considered that an alternative on the table is to make the JNI transitions fast using asymmetric dekker synchronization. > > If I understood the problem domain, you are running on linux, and not really using the async GC locking associated with exposing object addresses, but rather want the actual native call to be fast. > > In that context the arming side of handshakes/safepoints could use sys_membarrier where there is currently a StoreLoad fence. That way we could remove the StoreLoad fence on the back edge of the native transition, which is likely what actually costs something (last time I checked). > > In general, I?m not sure that this is a worthwhile tradeoff as the amortized cost of fencing has to sum up to the cost of the bigger hammer to be worth it. That would be a lot of native calls to pay for itself. But I suppose that alternative should at least be mentioned as it is a perfectly safe way of speeding up all native calls without resorting to cheating. The single thread handshake would be the most painful in this approach as we would use global synchronization to poke a single thread, unless we shot a signal or something instead for that use case. > > /Erik > > On 4 Jul 2022, at 23:47, Erik Osterlund wrote: > > ? > Hi, > > Here is a clarification on the ZGC interactions. > > The initial form of JNI critical native calls was implemented as an internal thing for SPARC crypto libraries, private to the JDK. JNI calls on SPARC involved flushing register windows, which was actually rather slow. > > This form came with a mechanism for lazily activating the GC locker for primitive arrays that the crypto code needed direct access to. This essentially deferred invoking the GC locker from the Java thread to the safepoint synchronizer. > > The problematic aspect for generational ZGC was the async GC locker interactions. Its implication is that each GC safepoint might fail, because the GC locker can?t be locked out before the safepoint is synchronized, so you end up instead trying to lock it inside GC safepoints, only to find that you couldn?t. > > The failed GC safepoints lead to GC opertions instead being started asynchronously from the GC locker. That was easier to deal with for the mainline version of ZGC since there was only one type of GC: full GCs. So we coped. > > With generational ZGC, the asynchronous operation has to figure out if it should poke the minor (young) and/or major (young + old) GC drivers. That problem is not easy to solve. However with JNI critical natives gone, the entire GC locker for ZGC is just a simple readers writer lock, where critical native functions use the readers lock and the GC operations use the writer lock. The GC safepoints can?t fail. > > With the new implementation that avoids doing a transition to native at all, the mentioned problem no longer occurs, as the safepoint synchronizer won?t allow safepoints to creep in right in the middle of all this. So it would seem we are okay with that. So I think as long as we don?t go with the previous async GC locker solution, we can remove ZGC interactions from the equation. > > However, you obviously instead get a trust problem instead with this flavour of cheating the system. Anything that takes a long ish time in a critical native function without a native transition, is going to be a disaster and hang the entire JVM. That is typically something we do not take lightly and is indeed why we have native transitions. > > So I would be delighted if we didn?t resurrect ways of cheating the system anyway, unless this is absolutely? critical. It took a long time to get rid of the cheats. > > /Erik > > On 4 Jul 2022, at 18:07, Maurizio Cimadamore wrote: > > ? > > Hi, > while I'm not an expert with some of the IO calls you mention (some of my colleagues are more knowledgeable in this area, so I'm sure they will have more info), my general sense is that, as with getrusage, if there is a system call involved, you already pay a hefty price for the user to kernel transition. On my machine this seem to cost around 200ns. In these cases, using JNI critical to shave off a dozen of nanoseconds (at best!) seems just not worth it. > > So, of the functions in your list, the ones in which I *believe* dropping transitions would have the most effect are (if we exclude getpid, for which another approach is possible) clock_gettime and getcpu, I believe, as they might use vdso [1], which typically brings the performance of these call closer to calls to shared lib functions. > > If you have examples e.g. where performance of recvmsg (or related calls) varies significantly between base JNI and critical JNI, please send them our way; I'm sure some of my colleagues would be intersted to take a look. > > Popping back a couple of levels, I think it would be helpful to also define what's an acceptable regression in this context. Of course, in an ideal world, we'd like to see no performance regression at all. But JNI critical is an unsupported interface, which might misbehave with modern garbage collectors (e.g. ZGC) and that requires quite a bit of internal complexity which might, in the medium/long run, hinder the evolution of the Java platform (all these things have _some_ cost, even if the cost is not directly material to developers). In this vein, I think calls like clock_gettime tend to be more problematic: as they complete very quickly, you see the cost of transitions a lot more. In other cases, where syscalls are involved, the cost associated to transitions are more likely to be "in the noise". Of course if we look at absolute numbers, dropping transitions would always yield "faster" code; but at the same time, going from 250ns to 245ns is very unlikely to result in visible performance difference when considering an application as a whole, so I think it's critical here to decide _which_ use cases to prioritize. > > I think a good outcome of this discussion would be if we could come to some shared understanding of which native calls are truly problematic (e.g. clock_gettime-like), and then for the JDK to provide better (and more maintainable) alternatives for those (which might even be faster than using critical JNI). > > Thanks > Maurizio > > [1] - https://man7.org/linux/man-pages/man7/vdso.7.html > > On 04/07/2022 12:23, Wojciech Kudla wrote: > > Thanks Maurizio, > > I raised this case mainly about clock_gettime and recvmsg/sendmsg, I think we're focusing on the wrong things here. Feel free to drop the two syscalls from the discussion entirely, but the main usecases I have been presenting throughout this thread definitely stand. > > Thanks > > > On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore wrote: >> >> Hi Wojtek, >> thanks for sharing this list, I think this is a good starting point to understand more about your use case. >> >> Last week I've been looking at "getrusage" (as you mentioned it in an earlier email), and I was surprised to see that the call took a pointer to a (fairly big) struct which then needed to be initialized with some thread-local state: >> >> https://man7.org/linux/man-pages/man2/getrusage.2.html >> >> I've looked at the implementation, and it seems to be doing memset on the user-provided struct pointer, plus all the fields assignment. Eyeballing the implementation, this does not seem to me like a "classic" use case where dropping transition would help much. I mean, surely dropping transitions would help shaving some nanoseconds off the call, but it doesn't seem to me that the call would be shortlived enough to make a difference. Do you have some benchmarks on this one? I did some [1] and the call overhead seemed to come up at 260ns/op - w/o transition you might perhaps be able to get to 250ns, but that's in the noise? >> >> As for getpid, note that you can do (since Java 9): >> >> ProcessHandle.current().pid(); >> >> I believe the impl caches the result, so it shouldn't even make the native call. >> >> Maurizio >> >> [1] - http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >> >> On 02/07/2022 07:42, Wojciech Kudla wrote: >> >> Hi Maurizio, >> >> Thanks for staying on this. >> >> > Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? >> >> From the top of my head: >> clock_gettime >> recvmsg >> recvmmsg >> sendmsg >> sendmmsg >> select >> getpid >> getcpu >> getrusage >> >> > Also, could you please tell us whether any of these calls need to interact with Java arrays? >> No arrays or objects of any type involved. Everything happens by the means of passing raw pointers as longs and using other primitive types as function arguments. >> >> > In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? >> Criticial JNI natives are used solely to remove the cost of transitions. We don't get anywhere near java heap in native code. >> >> In general I think it makes a lot of sense for Java as a language/platform to have some guards around unsafe code, but on the other hand the popularity of libraries employing Unsafe and their success in more performance-oriented corners of software engineering is a clear indicator there is a need for the JVM to provide access to more low-level primitives and mechanisms. >> I think it's entirely fair to tell developers that all bets are off when they get into some non-idiomatic scenarios but please don't take away a feature that greatly contributed to Java's success. >> >> Kind regards, >> Wojtek >> >> On Wed, Jun 29, 2022 at 5:20 PM Maurizio Cimadamore wrote: >>> >>> Hi Wojciech, >>> picking up this thread again. After some internal discussion, we realize that we don't know enough about your use case. While re-enabling JNI critical would obviously provide a quick fix, we're afraid that (a) developers might end up depending on JNI critical when they don't need to (perhaps also unaware of the consequences of depending on it) and (b) that there might actually be _better_ (as in: much faster) solutions than using critical native calls to address at least some of your use cases (that seemed to be the case with the clock_gettime example you mentioned). Could you please provide a rough list of the native calls you make where you believe critical JNI is having a real impact in the performance of your application? Also, could you please tell us whether any of these calls need to interact with Java arrays? In other words, do you use critical JNI to remove the cost associated with thread transitions, or are you also taking advantage of accessing on-heap memory _directly_ from native code? >>> >>> Regards >>> Maurizio >>> >>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>> >>> Hi Mark, >>> >>> Thanks for your input and apologies for the delayed response. >>> >>> > If the platform included, say, an intrinsified System.nanoRealTime() >>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>> that help developers in your unnamed industry? >>> >>> Exposing realtime clock with nanosecond granularity in the JDK would be a great step forward. I should have made it clear that I represent fintech corner (investment banking to be exact) but the issues my message touches upon span areas such as HPC, audio processing, gaming, and defense industry so it's not like we have an isolated case. >>> >>> > In a similar vein, if people are finding it necessary to ?replace parts >>> of NIO with hand-crafted native code? then it would be interesting to >>> understand what their requirements are >>> >>> As for the other example I provided with making very short lived syscalls such as recvmsg/recvmmsg the premise is getting access to hardware timestamps on the ingress and egress ends as well as enabling batch receive with a single syscall and otherwise exploiting features unavailable from the JDK (like access to CMSG interface, scatter/gather, etc). >>> There are also other examples of calls that we'd love to make often and at lowest possible cost (ie. getrusage) but I'm not sure if there's a strong case for some of these ideas, that's why it might be worth looking into more generic approach for performance sensitive code. >>> Hope this does better job at explaining where we're coming from than my previous messages. >>> >>> Thanks, >>> W >>> >>> On Tue, Jun 7, 2022 at 6:31 PM wrote: >>>> >>>> 2022/6/6 0:24:17 -0700, wkudla.kernel at gmail.com: >>>> >> Yes for System.nanoTime(), but System.currentTimeMillis() reports >>>> >> CLOCK_REALTIME. >>>> > >>>> > Unfortunately System.currentTimeMillis() offers only millisecond >>>> > granularity which is the reason why our industry has to resort to >>>> > clock_gettime. >>>> >>>> If the platform included, say, an intrinsified System.nanoRealTime() >>>> method that returned clock_gettime(CLOCK_REALTIME), how much would >>>> that help developers in your unnamed industry? >>>> >>>> In a similar vein, if people are finding it necessary to ?replace parts >>>> of NIO with hand-crafted native code? then it would be interesting to >>>> understand what their requirements are. Some simple enhancements to >>>> the NIO API would be much less costly to design and implement than a >>>> generalized user-level native-call intrinsification mechanism. >>>> >>>> - Mark From rpressler at openjdk.org Thu Jul 7 10:07:32 2022 From: rpressler at openjdk.org (Ron Pressler) Date: Thu, 7 Jul 2022 10:07:32 GMT Subject: RFR: 8286957: Held monitor count [v8] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:43:35 GMT, Robbin Ehn wrote: >> The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. >> >> This change makes the counting exact by pushing the counting down in the abstraction. >> The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". >> >> An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. >> >> Fixed aarch64, x64, x86 and zero. >> >> Passes t1-8 > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed strw, zero rename and made methods return 64 bit counter in all cases The changes to continuationEntry.hpp and continuationFreezeThaw.cpp look fine. I would suggest testing all tiers (1-5) in the Loom repo, too. ------------- PR: https://git.openjdk.org/jdk/pull/8945 From stuefe at openjdk.org Thu Jul 7 09:45:36 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Jul 2022 09:45:36 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v3] In-Reply-To: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> References: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> Message-ID: On Thu, 7 Jul 2022 09:29:12 GMT, Jie Fu wrote: >> Hi all, >> >> ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. >> >> For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. >> And `line` will be allocated by `getline` @line84. >> >> >> 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { >> 54 char* line_mountpoint = NULL; >> 55 char* line_filesystem = NULL; >> 56 >> 57 // Parse line and return a newly allocated string containing the mount point if >> 58 // the line contains a matching filesystem and the mount point is accessible by >> 59 // the current user. >> 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || >> 61 strcmp(line_filesystem, filesystem) != 0 || >> 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { >> 63 // Not a matching or accessible filesystem >> 64 os::free(line_mountpoint); >> 65 line_mountpoint = NULL; >> 66 } >> 67 >> 68 os::free(line_filesystem); >> 69 >> 70 return line_mountpoint; >> 71 } >> 72 >> 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { >> 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); >> 75 if (fd == NULL) { >> 76 ZErrno err; >> 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); >> 78 return; >> 79 } >> 80 >> 81 char* line = NULL; >> 82 size_t length = 0; >> 83 >> 84 while (getline(&line, &length, fd) != -1) { >> 85 char* const mountpoint = get_mountpoint(line, filesystem); >> 86 if (mountpoint != NULL) { >> 87 mountpoints->append(mountpoint); >> 88 } >> 89 } >> 90 >> 91 os::free(line); >> 92 fclose(fd); >> 93 } >> >> >> See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 >> >> That means we have raw `::malloc() -> os::free()`, which is unbalanced. >> Raw `::malloc()` does not write the header `os::free()` expects. >> If NMT is on, we assert now, because NMT does not find its header in os::free(). >> >> >> The fix just reverts `os::free()` to `::free()`. >> >> Testing: >> - hotspot/jtreg/gc/z on Linux/x64, all passed >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment Still good. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/9387 From jiefu at openjdk.org Thu Jul 7 09:29:12 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 7 Jul 2022 09:29:12 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v3] In-Reply-To: References: Message-ID: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> > Hi all, > > ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. > > For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. > And `line` will be allocated by `getline` @line84. > > > 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { > 54 char* line_mountpoint = NULL; > 55 char* line_filesystem = NULL; > 56 > 57 // Parse line and return a newly allocated string containing the mount point if > 58 // the line contains a matching filesystem and the mount point is accessible by > 59 // the current user. > 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || > 61 strcmp(line_filesystem, filesystem) != 0 || > 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { > 63 // Not a matching or accessible filesystem > 64 os::free(line_mountpoint); > 65 line_mountpoint = NULL; > 66 } > 67 > 68 os::free(line_filesystem); > 69 > 70 return line_mountpoint; > 71 } > 72 > 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { > 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); > 75 if (fd == NULL) { > 76 ZErrno err; > 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); > 78 return; > 79 } > 80 > 81 char* line = NULL; > 82 size_t length = 0; > 83 > 84 while (getline(&line, &length, fd) != -1) { > 85 char* const mountpoint = get_mountpoint(line, filesystem); > 86 if (mountpoint != NULL) { > 87 mountpoints->append(mountpoint); > 88 } > 89 } > 90 > 91 os::free(line); > 92 fclose(fd); > 93 } > > > See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 > > That means we have raw `::malloc() -> os::free()`, which is unbalanced. > Raw `::malloc()` does not write the header `os::free()` expects. > If NMT is on, we assert now, because NMT does not find its header in os::free(). > > > The fix just reverts `os::free()` to `::free()`. > > Testing: > - hotspot/jtreg/gc/z on Linux/x64, all passed > > Thanks. > Best regards, > Jie Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Address review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9387/files - new: https://git.openjdk.org/jdk/pull/9387/files/8ae3bfcf..26597295 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9387&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9387&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9387.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9387/head:pull/9387 PR: https://git.openjdk.org/jdk/pull/9387 From lucy at openjdk.org Thu Jul 7 09:40:30 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 7 Jul 2022 09:40:30 GMT Subject: RFR: JDK-8289799: Build warning in methodData.cpp memset zero-length parameter In-Reply-To: <_UgeK05iMUjHEZccIMx_qP8a9YKZZDeYDCJNtQzUVs0=.a1e1b39c-f059-4aab-ace8-571b03d87421@github.com> References: <_UgeK05iMUjHEZccIMx_qP8a9YKZZDeYDCJNtQzUVs0=.a1e1b39c-f059-4aab-ace8-571b03d87421@github.com> Message-ID: On Wed, 6 Jul 2022 07:23:17 GMT, Thomas Stuefe wrote: > Trivial fix for a compiler warning we see in our CI on Fedora 12 with GCC 8.3: > > > void Copy::pd_zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/cpu/x86/copy_x86.hpp:59:15, > inlined from 'static void Copy::zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/share/utilities/copy.hpp:298:21, > inlined from 'void MethodData::initialize()' at Looks good. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/9390 From mgronlun at openjdk.org Thu Jul 7 10:40:26 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 7 Jul 2022 10:40:26 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v2] In-Reply-To: References: <-fnY4EALUVaTIhtaGV0i-Fi02mH6B5zOkJIq4ebU_w8=.9f8ed319-f37e-4c27-aca3-808c74e61b95@github.com> Message-ID: On Thu, 7 Jul 2022 08:27:54 GMT, Lutz Schmidt wrote: > So is the common opinion to get back to a separate JIT start event (I think that naming is prefered over JIT restart, am I correct) ? Additionally we add CodeCache:max_capacity() at both JIT start and EventCodeCacheFull ? I think it is fine to use the term "JIT restart" because it is in use both in the code as well as in the output of log statements. Another reason would be that the first "JIT start" event would always be missing. It was only the meaning that got me a bit confused. Yes, a separate event is preferrable having no duration (startTime=false), no stack trace (stackTrace=false). The .jfc configs only need one element: true ------------- PR: https://git.openjdk.org/jdk/pull/9334 From stuefe at openjdk.org Thu Jul 7 09:46:41 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Jul 2022 09:46:41 GMT Subject: RFR: JDK-8289799: Build warning in methodData.cpp memset zero-length parameter In-Reply-To: References: <_UgeK05iMUjHEZccIMx_qP8a9YKZZDeYDCJNtQzUVs0=.a1e1b39c-f059-4aab-ace8-571b03d87421@github.com> Message-ID: On Wed, 6 Jul 2022 07:43:03 GMT, Jie Fu wrote: >> Trivial fix for a compiler warning we see in our CI on Fedora 12 with GCC 8.3: >> >> >> void Copy::pd_zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/cpu/x86/copy_x86.hpp:59:15, >> inlined from 'static void Copy::zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/share/utilities/copy.hpp:298:21, >> inlined from 'void MethodData::initialize()' at > > Looks good to me. Thanks @DamonFool and @RealLucy ! ------------- PR: https://git.openjdk.org/jdk/pull/9390 From duke at openjdk.org Thu Jul 7 11:00:42 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 7 Jul 2022 11:00:42 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines in C2 In-Reply-To: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: On Thu, 7 Jul 2022 04:10:57 GMT, Yi-Fan Tsai wrote: > A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. > > Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. Hi @yftsai, Thank you for the PR. >From the changes I see it covers both C1 and C2. Could you please update the JBS issues to reflect this? ------------- PR: https://git.openjdk.org/jdk/pull/9405 From jiefu at openjdk.org Thu Jul 7 09:29:13 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 7 Jul 2022 09:29:13 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v2] In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 08:10:55 GMT, David Holmes wrote: > The comment could be lifted to before sscanf so that it covers both uses of ::free. Updated. Thanks @dholmes-ora . ------------- PR: https://git.openjdk.org/jdk/pull/9387 From dnsimon at openjdk.org Thu Jul 7 11:19:44 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 7 Jul 2022 11:19:44 GMT Subject: RFR: 8282420: JFR: Remove event handlers [v6] In-Reply-To: References: Message-ID: On Tue, 10 May 2022 14:42:58 GMT, Erik Gahlin wrote: >> Hi, >> >> Could I have a review of a fix that removes event handler classes for JFR. Bytecode for event instrumentation is now only added to the event class. Benefits are: >> >> - No class memory leak in the boot class loader. >> - Reduce overhead from class loading during startup, which is important with additional JDK events that are coming (VirtualThreadStart etc.) >> - One less frame to traverse when recording a Java stack trace. >> >> Future benefits are: >> >> - Simplify creating instrumentation as a build step. See https://bugs.openjdk.java.net/browse/JDK-8279354 >> - Simplify implementation of Event Metrics. See https://bugs.openjdk.java.net/browse/JDK-8224749 >> >> When the Security Manager is removed, much of the code being added for security reasons can be deleted. >> >> There are few JFR hooks when code is being linked. Plan is to also use these for other events later. >> >> Testing: tier 1-4, jdk/jdk/jfr >> >> Thanks >> Erik > > Erik Gahlin has updated the pull request incrementally with one additional commit since the last revision: > > Minor fixes test/jdk/jdk/jfr/jvm/TestGetEventWriter.java line 114: > 112: Event e = newEventObject("RegisteredTrueEvent"); > 113: try { > 114: e.commit(); // throws When I modify this test to print the IllegalAccessError, I get: java.lang.IllegalAccessError: class jdk.jfr.jvm.RegisteredTrueEvent (in unnamed module @0x4ed05077) cannot access class jdk.jfr.internal.event.EventWriterFactory (in module jdk.jfr) because module jdk.jfr does not export jdk.jfr.internal.event to unnamed module @0x4ed05077 at jdk.jfr.jvm.RegisteredTrueEvent.commit(RegisteredTrueEvent.java:31) at jdk.jfr.jvm.TestGetEventWriter.testRegisteredTrueEvent(TestGetEventWriter.java:104) I was assuming this test is attempting to instead trigger the IAE thrown when linking the commit call. This is achieved by adding `-vmoptions:--add-exports=jdk.jfr/jdk.jfr.internal.event=ALL-UNNAMED` to the jtreg command line: java.lang.IllegalAccessError: illegal access linking method 'jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long)' at jdk.jfr.jvm.RegisteredTrueEvent.commit(RegisteredTrueEvent.java:31) at jdk.jfr.jvm.TestGetEventWriter.testRegisteredTrueEvent(TestGetEventWriter.java:104) Maybe this should be added to the `@run` directives? ------------- PR: https://git.openjdk.org/jdk/pull/8383 From stuefe at openjdk.org Thu Jul 7 09:46:43 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Jul 2022 09:46:43 GMT Subject: Integrated: JDK-8289799: Build warning in methodData.cpp memset zero-length parameter In-Reply-To: <_UgeK05iMUjHEZccIMx_qP8a9YKZZDeYDCJNtQzUVs0=.a1e1b39c-f059-4aab-ace8-571b03d87421@github.com> References: <_UgeK05iMUjHEZccIMx_qP8a9YKZZDeYDCJNtQzUVs0=.a1e1b39c-f059-4aab-ace8-571b03d87421@github.com> Message-ID: On Wed, 6 Jul 2022 07:23:17 GMT, Thomas Stuefe wrote: > Trivial fix for a compiler warning we see in our CI on Fedora 12 with GCC 8.3: > > > void Copy::pd_zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/cpu/x86/copy_x86.hpp:59:15, > inlined from 'static void Copy::zero_to_bytes(void*, size_t)' at /home/ubuntu/client_home/workspace/build-user-branch-linux_x86_64/SapMachine/src/hotspot/share/utilities/copy.hpp:298:21, > inlined from 'void MethodData::initialize()' at This pull request has now been integrated. Changeset: cce77a70 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/cce77a700141a854bafaa5ccb33db026affcf322 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8289799: Build warning in methodData.cpp memset zero-length parameter Reviewed-by: jiefu, lucy ------------- PR: https://git.openjdk.org/jdk/pull/9390 From aph at openjdk.org Thu Jul 7 12:30:41 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 7 Jul 2022 12:30:41 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines in C2 In-Reply-To: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: On Thu, 7 Jul 2022 04:10:57 GMT, Yi-Fan Tsai wrote: > A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. > > Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 52: > 50: } > 51: }; > 52: requests->sort(by_dest); While undoubtedly correct, the approach of sorting the requests then deleting duplicates is too heavyweight for this application. A hash table with linear probing (and chaining for calls to the same destination) would be a simple way to solve the problem. ------------- PR: https://git.openjdk.org/jdk/pull/9405 From stuefe at openjdk.org Thu Jul 7 12:45:40 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Jul 2022 12:45:40 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 01:11:50 GMT, Jie Fu wrote: >>> I'm curious, which tests did show the bug? Because I ran GHAs and a selection of our nightlies. >> >> This bug can be exposed by `java -XX:+UseZGC` on some of our cloud machines with os/tlinux, not all the platforms would crash (e.g., I didn't reproduce it on Ubuntu20.04). > >> Hi @DamonFool , >> >> thanks for the quick fix. Embarrassing, I should have catched this. >> >> Can you please add `#include "utilities/globalDefinitions.hpp"` for the ALLOW_... macros? >> >> Otherwise, apart from the comment changes below, fine. >> >> Cheers, Thomas > > Thanks @tstuefe for your review. > > All the comments have been addressed. > Thanks. Hi @DamonFool, can you pls integrate this fix? ------------- PR: https://git.openjdk.org/jdk/pull/9387 From duke at openjdk.org Thu Jul 7 12:48:51 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 7 Jul 2022 12:48:51 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines in C2 In-Reply-To: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: On Thu, 7 Jul 2022 04:10:57 GMT, Yi-Fan Tsai wrote: > A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. > > Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 29: > 27: #include "asm/macroAssembler.hpp" > 28: > 29: void CodeBuffer::shared_stub_to_runtime_for(address dest, int caller_offset) { We create shared trampolines. I recommend to use `shared_trampoline_for` instead. src/hotspot/cpu/aarch64/codeBuffer_aarch64.hpp line 30: > 28: > 29: public: > 30: class SharedStubToRuntimeCallRequest { Please rename to `SharedTrampolineRequest`. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 651: > 649: #endif > 650: if (!in_scratch_emit_size) { > 651: if (CodeBuffer::supports_shared_stubs() && entry.rspec().type() == relocInfo::runtime_call_type) { No need for `CodeBuffer::supports_shared_stubs()` because we know we have an implementation. Please convert it to `assert`. ------------- PR: https://git.openjdk.org/jdk/pull/9405 From dholmes at openjdk.org Thu Jul 7 12:54:51 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 7 Jul 2022 12:54:51 GMT Subject: RFR: 8278923: Document Klass::is_loader_alive In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 15:07:37 GMT, Coleen Phillimore wrote: > This trivial change just adds a comment to Klass::is_loader_alive. src/hotspot/share/oops/klass.inline.hpp line 48: > 46: // unloading, and hence during concurrent class unloading. > 47: // This returns false if the Klass is unloaded, or about to be unloaded because the holder of > 48: // the CLD is strongly reachable. Is that a typo? I would expect being unreachable to lead to unloading? ------------- PR: https://git.openjdk.org/jdk/pull/9400 From coleenp at openjdk.org Thu Jul 7 12:54:52 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Jul 2022 12:54:52 GMT Subject: RFR: 8278923: Document Klass::is_loader_alive In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 12:50:09 GMT, David Holmes wrote: >> This trivial change just adds a comment to Klass::is_loader_alive. > > src/hotspot/share/oops/klass.inline.hpp line 48: > >> 46: // unloading, and hence during concurrent class unloading. >> 47: // This returns false if the Klass is unloaded, or about to be unloaded because the holder of >> 48: // the CLD is strongly reachable. > > Is that a typo? I would expect being unreachable to lead to unloading? It is a typo thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9400 From mdoerr at openjdk.org Thu Jul 7 13:00:09 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Jul 2022 13:00:09 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v3] In-Reply-To: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> References: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> Message-ID: On Thu, 7 Jul 2022 09:29:12 GMT, Jie Fu wrote: >> Hi all, >> >> ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. >> >> For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. >> And `line` will be allocated by `getline` @line84. >> >> >> 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { >> 54 char* line_mountpoint = NULL; >> 55 char* line_filesystem = NULL; >> 56 >> 57 // Parse line and return a newly allocated string containing the mount point if >> 58 // the line contains a matching filesystem and the mount point is accessible by >> 59 // the current user. >> 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || >> 61 strcmp(line_filesystem, filesystem) != 0 || >> 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { >> 63 // Not a matching or accessible filesystem >> 64 os::free(line_mountpoint); >> 65 line_mountpoint = NULL; >> 66 } >> 67 >> 68 os::free(line_filesystem); >> 69 >> 70 return line_mountpoint; >> 71 } >> 72 >> 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { >> 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); >> 75 if (fd == NULL) { >> 76 ZErrno err; >> 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); >> 78 return; >> 79 } >> 80 >> 81 char* line = NULL; >> 82 size_t length = 0; >> 83 >> 84 while (getline(&line, &length, fd) != -1) { >> 85 char* const mountpoint = get_mountpoint(line, filesystem); >> 86 if (mountpoint != NULL) { >> 87 mountpoints->append(mountpoint); >> 88 } >> 89 } >> 90 >> 91 os::free(line); >> 92 fclose(fd); >> 93 } >> >> >> See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 >> >> That means we have raw `::malloc() -> os::free()`, which is unbalanced. >> Raw `::malloc()` does not write the header `os::free()` expects. >> If NMT is on, we assert now, because NMT does not find its header in os::free(). >> >> >> The fix just reverts `os::free()` to `::free()`. >> >> Testing: >> - hotspot/jtreg/gc/z on Linux/x64, all passed >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment Fix is good. Please check my 2 minor questions. src/hotspot/os/linux/gc/z/zMountPoint_linux.cpp line 31: > 29: #include "runtime/globals.hpp" > 30: #include "runtime/os.hpp" > 31: #include "utilities/globalDefinitions.hpp" Did you add this by intention or was it added by your IDE? src/hotspot/os/linux/gc/z/zMountPoint_linux.cpp line 93: > 91: } > 92: > 93: // readline will return malloced memory. Need raw ::free, not os::free. You mean `getline`? ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/9387 From jiefu at openjdk.org Thu Jul 7 13:00:11 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 7 Jul 2022 13:00:11 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 In-Reply-To: References: Message-ID: <1quNS_YF4X87Wo5VJkp2J3lSQCKiY0sid9qawv2hY6o=.87108330-98a5-47aa-83db-58252bf119f3@github.com> On Thu, 7 Jul 2022 01:11:50 GMT, Jie Fu wrote: >>> I'm curious, which tests did show the bug? Because I ran GHAs and a selection of our nightlies. >> >> This bug can be exposed by `java -XX:+UseZGC` on some of our cloud machines with os/tlinux, not all the platforms would crash (e.g., I didn't reproduce it on Ubuntu20.04). > >> Hi @DamonFool , >> >> thanks for the quick fix. Embarrassing, I should have catched this. >> >> Can you please add `#include "utilities/globalDefinitions.hpp"` for the ALLOW_... macros? >> >> Otherwise, apart from the comment changes below, fine. >> >> Cheers, Thomas > > Thanks @tstuefe for your review. > > All the comments have been addressed. > Thanks. > Hi @DamonFool, can you pls integrate this fix? Done. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9387 From jiefu at openjdk.org Thu Jul 7 13:00:12 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 7 Jul 2022 13:00:12 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v3] In-Reply-To: References: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> Message-ID: On Thu, 7 Jul 2022 12:50:09 GMT, Martin Doerr wrote: > Did you add this by intention or was it added by your IDE? It was added by intention. Any question? Thanks. > You mean `getline`? Yes. Sorry, it's too late. Shall we open another issue to fix it? ------------- PR: https://git.openjdk.org/jdk/pull/9387 From jiefu at openjdk.org Thu Jul 7 13:00:13 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 7 Jul 2022 13:00:13 GMT Subject: Integrated: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 02:17:39 GMT, Jie Fu wrote: > Hi all, > > ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. > > For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. > And `line` will be allocated by `getline` @line84. > > > 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { > 54 char* line_mountpoint = NULL; > 55 char* line_filesystem = NULL; > 56 > 57 // Parse line and return a newly allocated string containing the mount point if > 58 // the line contains a matching filesystem and the mount point is accessible by > 59 // the current user. > 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || > 61 strcmp(line_filesystem, filesystem) != 0 || > 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { > 63 // Not a matching or accessible filesystem > 64 os::free(line_mountpoint); > 65 line_mountpoint = NULL; > 66 } > 67 > 68 os::free(line_filesystem); > 69 > 70 return line_mountpoint; > 71 } > 72 > 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { > 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); > 75 if (fd == NULL) { > 76 ZErrno err; > 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); > 78 return; > 79 } > 80 > 81 char* line = NULL; > 82 size_t length = 0; > 83 > 84 while (getline(&line, &length, fd) != -1) { > 85 char* const mountpoint = get_mountpoint(line, filesystem); > 86 if (mountpoint != NULL) { > 87 mountpoints->append(mountpoint); > 88 } > 89 } > 90 > 91 os::free(line); > 92 fclose(fd); > 93 } > > > See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 > > That means we have raw `::malloc() -> os::free()`, which is unbalanced. > Raw `::malloc()` does not write the header `os::free()` expects. > If NMT is on, we assert now, because NMT does not find its header in os::free(). > > > The fix just reverts `os::free()` to `::free()`. > > Testing: > - hotspot/jtreg/gc/z on Linux/x64, all passed > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: 77ad998b Author: Jie Fu URL: https://git.openjdk.org/jdk/commit/77ad998b6e741f7cd7cdd52155c024bbc77f2027 Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 Reviewed-by: stuefe, dholmes, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/9387 From coleenp at openjdk.org Thu Jul 7 13:01:38 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Jul 2022 13:01:38 GMT Subject: RFR: 8278923: Document Klass::is_loader_alive [v2] In-Reply-To: References: Message-ID: <_dOmbbRg9ysGMQ2QgyeEdLuz8lYj3u90e9gzi-cvGQI=.f54df3de-48a7-4fc3-9ff6-19bda30f8213@github.com> > This trivial change just adds a comment to Klass::is_loader_alive. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix typo. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9400/files - new: https://git.openjdk.org/jdk/pull/9400/files/d3ab126a..f5701a34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9400&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9400&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9400.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9400/head:pull/9400 PR: https://git.openjdk.org/jdk/pull/9400 From dholmes at openjdk.org Thu Jul 7 13:01:38 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 7 Jul 2022 13:01:38 GMT Subject: RFR: 8278923: Document Klass::is_loader_alive [v2] In-Reply-To: <_dOmbbRg9ysGMQ2QgyeEdLuz8lYj3u90e9gzi-cvGQI=.f54df3de-48a7-4fc3-9ff6-19bda30f8213@github.com> References: <_dOmbbRg9ysGMQ2QgyeEdLuz8lYj3u90e9gzi-cvGQI=.f54df3de-48a7-4fc3-9ff6-19bda30f8213@github.com> Message-ID: On Thu, 7 Jul 2022 12:57:51 GMT, Coleen Phillimore wrote: >> This trivial change just adds a comment to Klass::is_loader_alive. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo. Looks good and trivial. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9400 From coleenp at openjdk.org Thu Jul 7 13:01:40 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Jul 2022 13:01:40 GMT Subject: RFR: 8278923: Document Klass::is_loader_alive In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 15:07:37 GMT, Coleen Phillimore wrote: > This trivial change just adds a comment to Klass::is_loader_alive. Thank you David! ------------- PR: https://git.openjdk.org/jdk/pull/9400 From stuefe at openjdk.org Thu Jul 7 13:03:01 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Jul 2022 13:03:01 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v3] In-Reply-To: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> References: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> Message-ID: On Thu, 7 Jul 2022 09:29:12 GMT, Jie Fu wrote: >> Hi all, >> >> ZGC crashes were observed by us after JDK-8289633 due to incorrect use of `os::free()` for mountpoint string handling. >> >> For example, `line_mountpoint` and `line_filesystem` will be allocated by `sscanf` @line60. >> And `line` will be allocated by `getline` @line84. >> >> >> 53 char* ZMountPoint::get_mountpoint(const char* line, const char* filesystem) const { >> 54 char* line_mountpoint = NULL; >> 55 char* line_filesystem = NULL; >> 56 >> 57 // Parse line and return a newly allocated string containing the mount point if >> 58 // the line contains a matching filesystem and the mount point is accessible by >> 59 // the current user. >> 60 if (sscanf(line, "%*u %*u %*u:%*u %*s %ms %*[^-]- %ms", &line_mountpoint, &line_filesystem) != 2 || >> 61 strcmp(line_filesystem, filesystem) != 0 || >> 62 access(line_mountpoint, R_OK|W_OK|X_OK) != 0) { >> 63 // Not a matching or accessible filesystem >> 64 os::free(line_mountpoint); >> 65 line_mountpoint = NULL; >> 66 } >> 67 >> 68 os::free(line_filesystem); >> 69 >> 70 return line_mountpoint; >> 71 } >> 72 >> 73 void ZMountPoint::get_mountpoints(const char* filesystem, ZArray* mountpoints) const { >> 74 FILE* fd = os::fopen(PROC_SELF_MOUNTINFO, "r"); >> 75 if (fd == NULL) { >> 76 ZErrno err; >> 77 log_error_p(gc)("Failed to open %s: %s", PROC_SELF_MOUNTINFO, err.to_string()); >> 78 return; >> 79 } >> 80 >> 81 char* line = NULL; >> 82 size_t length = 0; >> 83 >> 84 while (getline(&line, &length, fd) != -1) { >> 85 char* const mountpoint = get_mountpoint(line, filesystem); >> 86 if (mountpoint != NULL) { >> 87 mountpoints->append(mountpoint); >> 88 } >> 89 } >> 90 >> 91 os::free(line); >> 92 fclose(fd); >> 93 } >> >> >> See the anaylis of the crash reason in https://bugs.openjdk.org/browse/JDK-8289477 >> >> That means we have raw `::malloc() -> os::free()`, which is unbalanced. >> Raw `::malloc()` does not write the header `os::free()` expects. >> If NMT is on, we assert now, because NMT does not find its header in os::free(). >> >> >> The fix just reverts `os::free()` to `::free()`. >> >> Testing: >> - hotspot/jtreg/gc/z on Linux/x64, all passed >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/9387 From mgronlun at openjdk.org Thu Jul 7 13:15:44 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 7 Jul 2022 13:15:44 GMT Subject: RFR: 8282420: JFR: Remove event handlers [v6] In-Reply-To: References: <_nkpPJNh2EvuBJc_uD38QQfTwsTAdm37JJMxFEcGbbE=.2876eb7f-79f1-4b75-9208-f8d9f82c86c7@github.com> Message-ID: On Wed, 6 Jul 2022 20:25:57 GMT, Doug Simon wrote: >> Yes. It's been fixed. See: >> https://github.com/openjdk/jdk/pull/8680 > > Yes, I see the bodies are guarded. I was referring to the declarations here in `jfrResolution.hpp`. Shouldn't it be something like: > > #ifdef COMPILER1 > static void on_c1_resolution(const GraphBuilder * builder, const ciKlass * holder, const ciMethod * target); > #endif > #ifdef COMPILER2 > static void on_c2_resolution(const Parse * parse, const ciKlass * holder, const ciMethod * target); > #endif > > and likewise in `jfr.hpp`: > > #ifdef COMPILER2 > static void on_resolution(const Parse* parse, const ciKlass* holder, const ciMethod* target); > #endif > #ifdef COMPILER1 > static void on_resolution(const GraphBuilder* builder, const ciKlass* holder, const ciMethod* target); > #endif In general, we would like to reduce the amount of conditionalization because it makes the code harder to read and follow. Compilers should not require a definition until a definition is actually needed. Are you seeing compilation errors? ------------- PR: https://git.openjdk.org/jdk/pull/8383 From aph at openjdk.org Thu Jul 7 13:17:20 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 7 Jul 2022 13:17:20 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic Message-ID: The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. ------------- Commit messages: - 8289743: AArch64: Clean up patching logic - 8289698: AArch64: Need to relativize extended_sp in frame - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - ... and 6 more: https://git.openjdk.org/jdk/compare/77c3bbf1...ee6e4189 Changes: https://git.openjdk.org/jdk/pull/9398/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289743 Stats: 283 lines in 5 files changed: 113 ins; 44 del; 126 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From mdoerr at openjdk.org Thu Jul 7 13:21:54 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Jul 2022 13:21:54 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v3] In-Reply-To: References: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> Message-ID: On Thu, 7 Jul 2022 12:55:02 GMT, Jie Fu wrote: >> src/hotspot/os/linux/gc/z/zMountPoint_linux.cpp line 31: >> >>> 29: #include "runtime/globals.hpp" >>> 30: #include "runtime/os.hpp" >>> 31: #include "utilities/globalDefinitions.hpp" >> >> Did you add this by intention or was it added by your IDE? > >> Did you add this by intention or was it added by your IDE? > > It was added by intention. > Any question? > Thanks. No, that's fine. My VS code often adds it, but it's not strictly required. >> src/hotspot/os/linux/gc/z/zMountPoint_linux.cpp line 93: >> >>> 91: } >>> 92: >>> 93: // readline will return malloced memory. Need raw ::free, not os::free. >> >> You mean `getline`? > >> You mean `getline`? > > Yes. > Sorry, it's too late. > > Shall we open another issue to fix it? Not so important. Thanks for fixing the crashes! ------------- PR: https://git.openjdk.org/jdk/pull/9387 From dnsimon at openjdk.org Thu Jul 7 13:41:05 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 7 Jul 2022 13:41:05 GMT Subject: RFR: 8282420: JFR: Remove event handlers [v6] In-Reply-To: References: <_nkpPJNh2EvuBJc_uD38QQfTwsTAdm37JJMxFEcGbbE=.2876eb7f-79f1-4b75-9208-f8d9f82c86c7@github.com> Message-ID: On Thu, 7 Jul 2022 13:12:16 GMT, Markus Gr?nlund wrote: >> Yes, I see the bodies are guarded. I was referring to the declarations here in `jfrResolution.hpp`. Shouldn't it be something like: >> >> #ifdef COMPILER1 >> static void on_c1_resolution(const GraphBuilder * builder, const ciKlass * holder, const ciMethod * target); >> #endif >> #ifdef COMPILER2 >> static void on_c2_resolution(const Parse * parse, const ciKlass * holder, const ciMethod * target); >> #endif >> >> and likewise in `jfr.hpp`: >> >> #ifdef COMPILER2 >> static void on_resolution(const Parse* parse, const ciKlass* holder, const ciMethod* target); >> #endif >> #ifdef COMPILER1 >> static void on_resolution(const GraphBuilder* builder, const ciKlass* holder, const ciMethod* target); >> #endif > > In general, we would like to reduce the amount of conditionalization because it makes the code harder to read and follow. Compilers should not require a definition until a definition is actually needed. Are you seeing compilation errors? No I'm not which surprised me. I agree though, if the compilers don't complain, fewer conditional macros is better. ------------- PR: https://git.openjdk.org/jdk/pull/8383 From stuefe at openjdk.org Thu Jul 7 13:58:54 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Jul 2022 13:58:54 GMT Subject: [jdk19] RFR: 8289799: Build warning in methodData.cpp memset zero-length parameter Message-ID: Trival clean backport, prevents gcc warnings on Fedore 12 GCC 8.3 The commit being backported was authored by Thomas Stuefe on 7 Jul 2022 and was reviewed by Jie Fu and Lutz Schmidt. Thanks! ------------- Commit messages: - Backport cce77a700141a854bafaa5ccb33db026affcf322 Changes: https://git.openjdk.org/jdk19/pull/119/files Webrev: https://webrevs.openjdk.org/?repo=jdk19&pr=119&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289799 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk19/pull/119.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/119/head:pull/119 PR: https://git.openjdk.org/jdk19/pull/119 From mgronlun at openjdk.org Thu Jul 7 13:51:45 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 7 Jul 2022 13:51:45 GMT Subject: RFR: 8282420: JFR: Remove event handlers [v6] In-Reply-To: References: <_nkpPJNh2EvuBJc_uD38QQfTwsTAdm37JJMxFEcGbbE=.2876eb7f-79f1-4b75-9208-f8d9f82c86c7@github.com> Message-ID: On Thu, 7 Jul 2022 13:38:50 GMT, Doug Simon wrote: >> In general, we would like to reduce the amount of conditionalization because it makes the code harder to read and follow. Compilers should not require a definition until a definition is actually needed. Are you seeing compilation errors? > > No I'm not which surprised me. I agree though, if the compilers don't complain, fewer conditional macros is better. It is because the bodies in the call chain have been conditionalized out. There is no linking because there are no uses. ------------- PR: https://git.openjdk.org/jdk/pull/8383 From maurizio.cimadamore at oracle.com Thu Jul 7 14:36:17 2022 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 7 Jul 2022 15:36:17 +0100 Subject: Obsoleting JavaCritical In-Reply-To: References: <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> Message-ID: <54133cbd-95fe-7d87-c4fb-d46949a73787@oracle.com> I'm dropping most of direct recipients and going back to just use panama-dev and hotspot-dev, as it appears that our sever is having issues in handling too many recipients (the message that got delivered today was written few days ago :-) ). I suggest everybody doing the same, and just use mailing lists for further replies to this thread. Cheers Maurizio On 05/07/2022 12:33, Maurizio Cimadamore wrote: > > Hi, > As Erik explained in his reply, what we call "critical JNI" comes in > two pieces: one removes Java to native thread transitions (which is > what Wojciech is referring to), while another part interacts with the > GC locker (basically to allow critical JNI code to access Java arrays > w/o copying). I think the latter part is the most problematic GC-wise. > > Then, regarding the former, I think there are still questions as to > whether dropping transitions is the best way to get the performance > boost required; for instance, yesterday I did some experiments with an > experimental patch from Jorn (kudos) which re-enables an opt-in for > "trivial" native calls in the Panama API. I used it to test > clock_gettime, and, while there's an improvement, the results I got > were not as conclusive as one might expect expected. This is what I > get w/ state transitions: > > ``` > Benchmark???????????????????????????????? Mode? Cnt?? Score Error? Units > ClockgettimeTest.panama_monotonic???????? avgt?? 30? 27.814 ? 0.165? ns/op > ClockgettimeTest.panama_monotonic_coarse? avgt?? 30? 12.094 ? 0.103? ns/op > ClockgettimeTest.panama_monotonic_raw???? avgt?? 30? 27.719 ? 0.393? ns/op > ClockgettimeTest.panama_realtime????????? avgt?? 30? 27.133 ? 0.280? ns/op > ClockgettimeTest.panama_realtime_coarse?? avgt?? 30? 26.812 ? 0.384? ns/op > ``` > > And this is what I get with transitions removed: > > ``` > Benchmark???????????????????????????????? Mode? Cnt?? Score Error? Units > ClockgettimeTest.panama_monotonic???????? avgt?? 30? 22.383 ? 0.213? ns/op > ClockgettimeTest.panama_monotonic_coarse? avgt?? 30?? 6.312 ? 0.117? ns/op > ClockgettimeTest.panama_monotonic_raw???? avgt?? 30? 22.731 ? 0.279? ns/op > ClockgettimeTest.panama_realtime????????? avgt?? 30? 22.503 ? 0.292? ns/op > ClockgettimeTest.panama_realtime_coarse?? avgt?? 30? 21.853 ? 0.100? ns/op > > ``` > > Here we can see a gain of 4-5ns, obtained by dropping the transition. > The only case where this makes a significant difference is with the > monotonic_coarse flavor. In the other cases there's a difference, yes, > but not as pronounced, simply because the term we're comparing against > is bigger: it's easy to see a 5ns gain if your function runs for 10ns > in total - but such a gain starts to get lost in the "noise" when > functions run for longer. And that's the main issue with removing > Java->native transitions: the "window" in which this optimization > yield a positive effect is extremely narrow (anything lasting longer > than 30ns won't probably appreciate much difference), but, as you can > see from the PR in [1], the VM changes required to support it touch > quite a bit of stuff! > > Luckily, selectively disabling transitions from Panama is slightly > more straightforward and, perhaps, for stuff like recvmsg syscalls > that are bypassed, there's not much else we can do: while one could > imagine Panama special-casing calls to clock_gettime, as that's a > known "leaf", the same cannot be done with rcvmsg, which is in general > a blocking call. Panama also has a "trusted mode" flag > (--enable-native-access), so there is a way in the Panama API to > distinguish between safe and unsafe API point, which also helps with > this. The risk of course is for developers to see whatever mechanism > is provided as some kind of "make my code go fast please" and apply it > blindly, w/o fully understanding the consequences. What I said before > about "extremely narrow window" remains true: in the vast majority of > cases (like 99%) dropping state transitions can result in very big > downsides, while the corresponding upsides are not big enough to even > be noticeable (the Q/A in [2] arrives at a very similar conclusion). > > All this said, selectively disabling state transitions from native > calls made using the Panama foreign API seem the most straightforward > way to offset the performance delta introduced by the removal of > critical JNI. In part it's because the Panama API is more flexible, > e.g. function descriptors allows us to model the distinction between a > trivial and non-trivial call; in part it's because, as stated above, > Panama can already reason about calls that are "unsafe" and that > require extra permissions. And, finally it's also because, if we added > back critical JNI, we'd probably add it back w/o its most problematic > GC locker parts (that's what [1] does AFAIK) - which means it won't be > a complete code reversal. So, perhaps, coming up with a fresh > mechanism to drop transitions (only) could also be less confusing for > developers. Of course this would require developers such as Wojciech > to rewrite some of the code to use Panama instead of JNI. > > And, coming back to clock_gettime, my feeling is that with the right > tools (e.g. some intrinsics), we can make that go a lot faster than > what shown above. Being able to quickly get a timestamp seems a > widely-enough applicable use case to deserves some special treatment. > So, perhaps, it's worth considering a _spectrum of solutions_ on how > to improve the status quo, rather than investing solely on the removal > of thread transitions. > > Maurizio > > [1] - https://github.com/openjdk/jdk19/pull/90/files > [2] - https://youtu.be/LoyBTqkSkZk?t=742 > > > On 04/07/2022 18:38, Vitaly Davidovich wrote: >> To not sidetrack this thread with my previous reply: >> >> Maurizio - are you saying java criticals are *already* hindering ZGC >> and/or other planned Hotspot improvements? Or that theoretically they >> could and you?d like to remove/deprecate them now(ish)? >> >> If it?s the former, perhaps it?s prudent to keep them around until a >> compelling case surfaces where they preclude or severely restrict >> evolution of the platform? If it?s the former, would be curious what >> that is but would also understand the rationale behind wanting to >> remove it. >> >> On Mon, Jul 4, 2022 at 1:26 PM Vitaly Davidovich >> wrote: >> >> >> >> On Mon, Jul 4, 2022 at 1:13 PM Wojciech Kudla >> wrote: >> >> Thanks for your input, Vitaly. I'd be interested to find out >> more about the nature of the HW noise you observed in your >> benchmarks as our results were very consistent and it was >> pretty straightforward to pinpoint the culprit as JNI call >> overhead. Maybe it was just easier for us because we disallow >> C- and P-state transitions and put a lot of effort to >> eliminate platform jitter in general. Were you maybe running >> on a CPU model that doesn't support constant TSC? I would >> also suggest retrying with LAPIC interrupts suppressed (with: >> cli/sti) to maybe see if it's the kernel and not the hardware. >> >> This was on a Broadwell Xeon chipset with constant tsc.? All the >> typical jitter sources were reduced: C/P states disabled in bios, >> max turbo enabled, IRQs steered away, core isolated, etc.? By the >> way, by noise I don?t mean the results themselves were noisy - >> they were constant run to run.? I just meant the delta between >> normal vs critical JNI entrypoints was very minimal - ie ?in the >> noise?, particularly with rdtsc. >> >> I can try to remeasure on newer Intel but see below ? >> >> >> >> 100% agree on rdtsc(p) and snippets. There are some narrow >> usecases were one can get some substantial speed ups with >> direct access to prefetch or by abusing misprediction to keep >> icache hot. These scenarios are sadly only available with >> inline assembly. I know of a few shops that go to the length >> of forking Graal, etc to achieve that but am quite convinced >> such capabilities would be welcome and utilized by many more >> groups if they were easily accessible from java. >> >> I?m of the firm (and perhaps controversial for some :)) opinion >> these days that Java is simply the wrong platform/tool for low >> latency cases that warrant this level of control. There?re very >> strong headwinds even outside of JNI costs.? And the ?real? >> problem with JNI, besides transition costs, is lack of inlining >> into the native calls.? So even if JVM transition costs are fully >> eliminated, there?s still an optimization fence due to lost >> inlining (not unlike native code calling native fns via shared libs). >> >> That?s not say that perf regressions are welcomed - nobody likes >> those :). >> >> >> >> Thanks, >> W. >> >> On Mon, Jul 4, 2022 at 5:51 PM Vitaly Davidovich >> wrote: >> >> I?d add rdtsc(p) wrapper functions to the list.? These >> are usually either inline asm or compiler intrinsic in >> the JNI entrypoint.? In addition, any native libs exposed >> via JNI that have ?trivial? functions are also candidates >> for faster calling conventions.? There?re sometimes way >> to mitigate the call overhead (eg batching) but it?s not >> always feasible. >> >> I?ll add that last time I tried to measure the >> improvement of Java criticals for clock_gettime (and >> rdtsc) it looked to be in the noise on the hardware I was >> testing on.? It got the point where I had to instrument >> the critical and normal JNI entrypoints to confirm the >> critical was being hit.? The critical calling convention >> isn?t significantly different *if* basic primitives (or >> no args at all) are passed as args. JNIEnv*, IIRC, is >> loaded from a register so that?s minor. ?jclass (for >> static calls, which is what?s relevant here) should be a >> compiled constant.? Critical call still has a GCLocker >> check.? So I?m not actually sure what the significant >> difference is for ?lightweight? (ie few primitive or no >> args, primitive return types) calls. >> >> In general, I do think it?d be nice if there was a faster >> native call sequence, even if it comes with a caveat >> emptor and/or special requirements on the callee (not >> unlike the requirements for criticals).? I think Vladimir >> Ivanov was working on ?snippets? that allowed dynamic >> construction of a native call, possibly including >> assembly.? Not sure where that exploration is these days, >> but that would be a welcome capability. >> >> My $.02.? Happy 4th of July for those celebrating! >> >> Vitaly >> >> On Mon, Jul 4, 2022 at 12:04 PM Maurizio Cimadamore >> wrote: >> >> Hi, >> while I'm not an expert with some of the IO calls you >> mention (some of my colleagues are more knowledgeable >> in this area, so I'm sure they will have more info), >> my general sense is that, as with getrusage, if there >> is a system call involved, you already pay a hefty >> price for the user to kernel transition. On my >> machine this seem to cost around 200ns. In these >> cases, using JNI critical to shave off a dozen of >> nanoseconds (at best!) seems just not worth it. >> >> So, of the functions in your list, the ones in which >> I *believe* dropping transitions would have the most >> effect are (if we exclude getpid, for which another >> approach is possible) clock_gettime and getcpu, I >> believe, as they might use vdso [1], which typically >> brings the performance of these call closer to calls >> to shared lib functions. >> >> If you have examples e.g. where performance of >> recvmsg (or related calls) varies significantly >> between base JNI and critical JNI, please send them >> our way; I'm sure some of my colleagues would be >> intersted to take a look. >> >> Popping back a couple of levels, I think it would be >> helpful to also define what's an acceptable >> regression in this context. Of course, in an ideal >> world,? we'd like to see no performance regression at >> all. But JNI critical is an unsupported interface, >> which might misbehave with modern garbage collectors >> (e.g. ZGC) and that requires quite a bit of internal >> complexity which might, in the medium/long run, >> hinder the evolution of the Java platform (all these >> things have _some_ cost, even if the cost is not >> directly material to developers). In this vein, I >> think calls like clock_gettime tend to be more >> problematic: as they complete very quickly, you see >> the cost of transitions a lot more. In other cases, >> where syscalls are involved, the cost associated to >> transitions are more likely to be "in the noise". Of >> course if we look at absolute numbers, dropping >> transitions would always yield "faster" code; but at >> the same time, going from 250ns to 245ns is very >> unlikely to result in visible performance difference >> when considering an application as a whole, so I >> think it's critical here to decide _which_ use cases >> to prioritize. >> >> I think a good outcome of this discussion would be if >> we could come to some shared understanding of which >> native calls are truly problematic (e.g. >> clock_gettime-like), and then for the JDK to provide >> better (and more maintainable) alternatives for those >> (which might even be faster than using critical JNI). >> >> Thanks >> Maurizio >> >> [1] - https://man7.org/linux/man-pages/man7/vdso.7.html >> >> On 04/07/2022 12:23, Wojciech Kudla wrote: >>> Thanks Maurizio, >>> >>> I raised this case mainly about clock_gettime and >>> recvmsg/sendmsg, I think we're focusing on the wrong >>> things here. Feel free to drop the two syscalls from >>> the discussion entirely, but the main usecases I >>> have been presenting throughout this thread >>> definitely stand. >>> >>> Thanks >>> >>> >>> On Mon, Jul 4, 2022 at 10:54 AM Maurizio Cimadamore >>> wrote: >>> >>> Hi Wojtek, >>> thanks for sharing this list, I think this is a >>> good starting point to understand more about >>> your use case. >>> >>> Last week I've been looking at "getrusage" (as >>> you mentioned it in an earlier email), and I was >>> surprised to see that the call took a pointer to >>> a (fairly big) struct which then needed to be >>> initialized with some thread-local state: >>> >>> https://man7.org/linux/man-pages/man2/getrusage.2.html >>> >>> I've looked at the implementation, and it seems >>> to be doing memset on the user-provided struct >>> pointer, plus all the fields assignment. >>> Eyeballing the implementation, this does not >>> seem to me like a "classic" use case where >>> dropping transition would help much. I mean, >>> surely dropping transitions would help shaving >>> some nanoseconds off the call, but it doesn't >>> seem to me that the call would be shortlived >>> enough to make a difference. Do you have some >>> benchmarks on this one? I did some [1] and the >>> call overhead seemed to come up at 260ns/op - >>> w/o transition you might perhaps be able to get >>> to 250ns, but that's in the noise? >>> >>> As for getpid, note that you can do (since Java 9): >>> >>> ProcessHandle.current().pid(); >>> >>> I believe the impl caches the result, so it >>> shouldn't even make the native call. >>> >>> Maurizio >>> >>> [1] - >>> http://cr.openjdk.java.net/~mcimadamore/panama/GetrusageTest.java >>> >>> On 02/07/2022 07:42, Wojciech Kudla wrote: >>>> Hi Maurizio, >>>> >>>> Thanks for staying on this. >>>> >>>> > Could you please provide a rough list of the >>>> native calls you make where you believe >>>> critical JNI is having a real impact in the >>>> performance of your application? >>>> >>>> From the top of my head: >>>> clock_gettime >>>> recvmsg >>>> recvmmsg >>>> sendmsg >>>> sendmmsg >>>> select >>>> getpid >>>> getcpu >>>> getrusage >>>> >>>> > Also, could you please tell us whether any of >>>> these calls need to interact with Java arrays? >>>> No arrays or objects of any type involved. >>>> Everything happens by the means of passing raw >>>> pointers as longs and using other primitive >>>> types as function arguments. >>>> >>>> > In other words, do you use critical JNI to >>>> remove the cost associated with thread >>>> transitions, or are you also taking advantage >>>> of accessing on-heap memory _directly_ from >>>> native code? >>>> Criticial JNI natives are used solely to remove >>>> the cost of transitions. We don't get anywhere >>>> near java heap in native code. >>>> >>>> In general I think it makes a lot of sense for >>>> Java as a language/platform to have some guards >>>> around unsafe code, but on the other hand the >>>> popularity of libraries employing Unsafe and >>>> their success in more performance-oriented >>>> corners of software engineering is a clear >>>> indicator there is a need for the JVM to >>>> provide access to more low-level primitives and >>>> mechanisms. >>>> I think it's entirely fair to tell developers >>>> that all bets are off when they get into some >>>> non-idiomatic scenarios but please don't take >>>> away a feature that greatly contributed to >>>> Java's success. >>>> >>>> Kind regards, >>>> Wojtek >>>> >>>> On Wed, Jun 29, 2022 at 5:20 PM Maurizio >>>> Cimadamore wrote: >>>> >>>> Hi Wojciech, >>>> picking up this thread again. After some >>>> internal discussion, we realize that we >>>> don't know enough about your use case. >>>> While re-enabling JNI critical would >>>> obviously provide a quick fix, we're afraid >>>> that (a) developers might end up depending >>>> on JNI critical when they don't need to >>>> (perhaps also unaware of the consequences >>>> of depending on it) and (b) that there >>>> might actually be _better_ (as in: much >>>> faster) solutions than using critical >>>> native calls to address at least some of >>>> your use cases (that seemed to be the case >>>> with the clock_gettime example you >>>> mentioned). Could you please provide a >>>> rough list of the native calls you make >>>> where you believe critical JNI is having a >>>> real impact in the performance of your >>>> application? Also, could you please tell us >>>> whether any of these calls need to interact >>>> with Java arrays? In other words, do you >>>> use critical JNI to remove the cost >>>> associated with thread transitions, or are >>>> you also taking advantage of accessing >>>> on-heap memory _directly_ from native code? >>>> >>>> Regards >>>> Maurizio >>>> >>>> On 13/06/2022 21:38, Wojciech Kudla wrote: >>>>> Hi Mark, >>>>> >>>>> Thanks for your input and apologies for >>>>> the delayed response. >>>>> >>>>> > If the platform included, say, an >>>>> intrinsified System.nanoRealTime() >>>>> method that returned >>>>> clock_gettime(CLOCK_REALTIME), how much would >>>>> that help developers in your unnamed industry? >>>>> >>>>> Exposing realtime clock with nanosecond >>>>> granularity in the JDK would be a great >>>>> step forward. I should have made it clear >>>>> that I represent fintech corner >>>>> (investment banking to be exact) but the >>>>> issues my message touches upon span areas >>>>> such as HPC, audio processing, gaming, and >>>>> defense industry so it's not like we have >>>>> an isolated case. >>>>> >>>>> > In a similar vein, if people are finding >>>>> it necessary to ?replace parts >>>>> of NIO with hand-crafted native code? then >>>>> it would be interesting to >>>>> understand what their requirements are >>>>> >>>>> As for the other example I provided with >>>>> making very short lived syscalls such as >>>>> recvmsg/recvmmsg the premise is getting >>>>> access to hardware timestamps on the >>>>> ingress and egress ends as well as >>>>> enabling batch receive with a single >>>>> syscall and otherwise exploiting features >>>>> unavailable from the JDK (like access to >>>>> CMSG interface, scatter/gather, etc). >>>>> There are also other examples of calls >>>>> that we'd love to make often and at lowest >>>>> possible cost (ie. getrusage) but I'm not >>>>> sure if there's a strong case for some of >>>>> these ideas, that's why it might be worth >>>>> looking into more generic approach for >>>>> performance sensitive code. >>>>> Hope this does better job at explaining >>>>> where we're coming from than my previous >>>>> messages. >>>>> >>>>> Thanks, >>>>> W >>>>> >>>>> On Tue, Jun 7, 2022 at 6:31 PM >>>>> wrote: >>>>> >>>>> 2022/6/6 0:24:17 -0700, >>>>> wkudla.kernel at gmail.com: >>>>> >> Yes for System.nanoTime(), but >>>>> System.currentTimeMillis() reports >>>>> >> CLOCK_REALTIME. >>>>> > >>>>> > Unfortunately >>>>> System.currentTimeMillis() offers only >>>>> millisecond >>>>> > granularity which is the reason why >>>>> our industry has to resort to >>>>> > clock_gettime. >>>>> >>>>> If the platform included, say, an >>>>> intrinsified System.nanoRealTime() >>>>> method that returned >>>>> clock_gettime(CLOCK_REALTIME), how >>>>> much would >>>>> that help developers in your unnamed >>>>> industry? >>>>> >>>>> In a similar vein, if people are >>>>> finding it necessary to ?replace parts >>>>> of NIO with hand-crafted native code? >>>>> then it would be interesting to >>>>> understand what their requirements >>>>> are.? Some simple enhancements to >>>>> the NIO API would be much less costly >>>>> to design and implement than a >>>>> generalized user-level native-call >>>>> intrinsification mechanism. >>>>> >>>>> - Mark >>>>> >> -- >> Sent from my phone >> >> -- >> Sent from my phone >> >> -- >> Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Thu Jul 7 14:56:39 2022 From: duke at openjdk.org (Justin Gu) Date: Thu, 7 Jul 2022 14:56:39 GMT Subject: RFR: 8289164: Convert ResolutionErrorTable to use ResourceHashtable [v2] In-Reply-To: References: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> Message-ID: On Fri, 1 Jul 2022 18:48:54 GMT, Justin Gu wrote: >> Please review my change of converting the resolutionErrorTable from hashtable to resource hashtable. I tested my changes with a mach5 tier1-4 test. > > Justin Gu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8289164: Convert ResolutionErrorTable to use ResourceHashtable Thank you everybody! ------------- PR: https://git.openjdk.org/jdk/pull/9337 From kvn at openjdk.org Thu Jul 7 18:57:33 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 7 Jul 2022 18:57:33 GMT Subject: RFR: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 07:51:01 GMT, Fei Gao wrote: > Superword doesn't vectorize any nodes of non-primitive types and > thus sets `allow_address` false when calling type2aelembytes() in > SuperWord::data_size()[1]. Therefore, when we try to resolve the > data size for a node of T_ADDRESS type, the assertion in > type2aelembytes()[2] takes effect. > > We try to resolve the data sizes for node s and node t in the > SuperWord::adjust_alignment_for_type_conversion()[3] when type > conversion between different data sizes happens. The issue is, > when node s is a ConvI2L node and node t is an AddP node of > T_ADDRESS type, type2aelembytes() will assert. To fix it, we > should filter out all non-primitive nodes, like the patch does > in SuperWord::adjust_alignment_for_type_conversion(). Since > it's a failure in the mid-end, all superword available platforms > are affected. In my local test, this failure can be reproduced > on both x86 and aarch64. With this patch, the failure can be fixed. > > Apart from fixing the bug, the patch also adds necessary type check > and does some clean-up in SuperWord::longer_type_for_conversion() > and VectorCastNode::implemented(). > > [1]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1417 > [2]https://github.com/openjdk/jdk/blob/b96ba19807845739b36274efb168dd048db819a3/src/hotspot/share/utilities/globalDefinitions.cpp#L326 > [3]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1454 In which call to `adjust_alignment_for_type_conversion()` you got AddP node? Should we add checks there too? ------------- PR: https://git.openjdk.org/jdk/pull/9391 From forax at openjdk.org Thu Jul 7 15:08:51 2022 From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax) Date: Thu, 7 Jul 2022 15:08:51 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: On Wed, 15 Jun 2022 09:30:59 GMT, Boris Ulasevich wrote: > Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. > > Cleanup work: apply CompLevel type where applicable. > > The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime > > Renaissance benchmarks shows no performance regressions on x86 and aarch. > > BEFORE: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > /* 8 | 4 */ const CompilerType _type; <<<< > /* 12 | 4 */ int _size; > /* 16 | 4 */ int _header_size; > /* 20 | 4 */ int _frame_complete_offset; > /* 24 | 4 */ int _data_offset; > /* 28 | 4 */ int _frame_size; > /* 32 | 8 */ address _code_begin; > /* 40 | 8 */ address _code_end; > /* 48 | 8 */ address _content_begin; > /* 56 | 8 */ address _data_end; > /* 64 | 8 */ address _relocation_begin; > /* 72 | 8 */ address _relocation_end; > /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 88 | 1 */ bool _caller_must_gc_arguments; > /* 89 | 1 */ bool _is_compiled; > /* XXX 6-byte hole */ > /* 96 | 8 */ const char *_name; > /* 104 | 8 */ class AsmRemarks { > /* 104 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 112 | 8 */ class DbgStrings { > /* 112 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 120 */ > } > > AFTER: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > protected: > /* 8 | 8 */ address _code_begin; > /* 16 | 8 */ address _code_end; > /* 24 | 8 */ address _content_begin; > /* 32 | 8 */ address _data_end; > /* 40 | 8 */ address _relocation_begin; > /* 48 | 8 */ address _relocation_end; > /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 64 | 8 */ const char *_name; > /* 72 | 4 */ int _size; > /* 76 | 4 */ int _header_size; > /* 80 | 4 */ int _frame_complete_offset; > /* 84 | 4 */ int _data_offset; > /* 88 | 4 */ int _frame_size; > /* 92 | 1 */ bool _caller_must_gc_arguments; > /* 93 | 1 */ bool _is_compiled; > /* 94 | 1 */ const CompilerType _type; <<<< > /* XXX 1-byte hole */ > /* 96 | 8 */ class AsmRemarks { > /* 96 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 104 | 8 */ class DbgStrings { > /* 104 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 112 */ > } > > BEFORE: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > private: > /* 208 | 4 */ int _entry_bci; > /* XXX 4-byte hole */ > /* 216 | 8 */ uint64_t _gc_epoch; > /* 224 | 8 */ nmethod *_osr_link; > /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 240 | 8 */ address _entry_point; > /* 248 | 8 */ address _verified_entry_point; > /* 256 | 8 */ address _osr_entry_point; > /* 264 | 4 */ int _exception_offset; > /* 268 | 4 */ int _unwind_handler_offset; > /* 272 | 4 */ int _consts_offset; > /* 276 | 4 */ int _stub_offset; > /* 280 | 4 */ int _oops_offset; > /* 284 | 4 */ int _metadata_offset; > /* 288 | 4 */ int _scopes_data_offset; > /* 292 | 4 */ int _scopes_pcs_offset; > /* 296 | 4 */ int _dependencies_offset; > /* 300 | 4 */ int _handler_table_offset; > /* 304 | 4 */ int _nul_chk_table_offset; > /* 308 | 4 */ int _speculations_offset; > /* 312 | 4 */ int _jvmci_data_offset; > /* 316 | 4 */ int _nmethod_end_offset; > /* 320 | 4 */ int _orig_pc_offset; > /* 324 | 4 */ int _compile_id; > /* 328 | 4 */ int _comp_level; <<<< > /* 332 | 1 */ bool _has_flushed_dependencies; > /* 333 | 1 */ bool _unload_reported; > /* 334 | 1 */ bool _load_reported; > /* 335 | 1 */ volatile signed char _state; > /* 336 | 1 */ bool _oops_are_stale; > /* XXX 3-byte hole */ > /* 340 | 4 */ RTMState _rtm_state; > /* 344 | 4 */ volatile jint _lock_count; > /* XXX 4-byte hole */ > /* 352 | 8 */ volatile int64_t _stack_traversal_mark; > /* 360 | 4 */ int _hotness_counter; > /* 364 | 1 */ volatile uint8_t _is_unloading_state; > /* XXX 3-byte hole */ > /* 368 | 4 */ ByteSize _native_receiver_sp_offset; > /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; > > /* total size (bytes): 376 */ > } > > AFTER: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; > /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 232 | 8 */ address _entry_point; > /* 240 | 8 */ address _verified_entry_point; > /* 248 | 8 */ address _osr_entry_point; > /* 256 | 4 */ int _entry_bci; > /* 260 | 4 */ int _exception_offset; > /* 264 | 4 */ int _unwind_handler_offset; > /* 268 | 4 */ int _consts_offset; > /* 272 | 4 */ int _stub_offset; > /* 276 | 4 */ int _oops_offset; > /* 280 | 4 */ int _metadata_offset; > /* 284 | 4 */ int _scopes_data_offset; > /* 288 | 4 */ int _scopes_pcs_offset; > /* 292 | 4 */ int _dependencies_offset; > /* 296 | 4 */ int _handler_table_offset; > /* 300 | 4 */ int _nul_chk_table_offset; > /* 304 | 4 */ int _speculations_offset; > /* 308 | 4 */ int _jvmci_data_offset; > /* 312 | 4 */ int _nmethod_end_offset; > /* 316 | 4 */ int _orig_pc_offset; > /* 320 | 4 */ int _compile_id; > /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; > /* 332 | 4 */ int _hotness_counter; > /* 336 | 4 */ ByteSize _native_receiver_sp_offset; > /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; > /* 344 | 1 */ CompLevel _comp_level; <<<< > /* 345 | 1 */ volatile uint8_t _is_unloading_state; > /* 346 | 1 */ bool _has_flushed_dependencies; > /* 347 | 1 */ bool _unload_reported; > /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; > /* 350 | 1 */ bool _oops_are_stale; > > /* total size (bytes): 352 */ > } It's also a good idea to have 64 bytes in between volatiles so they are not in the same cache-line ------------- PR: https://git.openjdk.org/jdk/pull/9165 From eosterlund at openjdk.org Thu Jul 7 19:13:45 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 7 Jul 2022 19:13:45 GMT Subject: RFR: 8286957: Held monitor count [v8] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:43:35 GMT, Robbin Ehn wrote: >> The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. >> >> This change makes the counting exact by pushing the counting down in the abstraction. >> The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". >> >> An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. >> >> Fixed aarch64, x64, x86 and zero. >> >> Passes t1-8 > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed strw, zero rename and made methods return 64 bit counter in all cases I could not find any error. Looks good to me. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/8945 From coleenp at openjdk.org Thu Jul 7 15:16:40 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Jul 2022 15:16:40 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 19:09:09 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Do not remove Forte::register_stub as it is used on Linux as well > > src/hotspot/share/prims/forte.hpp line 32: > >> 30: class Forte : AllStatic { >> 31: public: >> 32: static bool is_enabled() NOT_JVMTI_RETURN_(false); > > I don't think the rest of this forte code is disabled by JVMTI. If the answer to whether it's enabled is something you want to be fast, and doesn't change, maybe make it check a variable? ------------- PR: https://git.openjdk.org/jdk/pull/9386 From duke at openjdk.org Thu Jul 7 15:00:17 2022 From: duke at openjdk.org (Justin Gu) Date: Thu, 7 Jul 2022 15:00:17 GMT Subject: Integrated: 8289164: Convert ResolutionErrorTable to use ResourceHashtable In-Reply-To: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> References: <4Vj4wWy9DvJqV0CHHVy4Z3-TNysikK9DjyZ9H_8Kd90=.0eb4cefd-1ec9-4f3f-b6b0-05b673ea46af@github.com> Message-ID: On Thu, 30 Jun 2022 15:15:45 GMT, Justin Gu wrote: > Please review my change of converting the resolutionErrorTable from hashtable to resource hashtable. I tested my changes with a mach5 tier1-4 test. This pull request has now been integrated. Changeset: 86f63f97 Author: Justin Gu Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/86f63f9703b47b3b5b8fd093dbd117d8746091ff Stats: 303 lines in 7 files changed: 124 ins; 110 del; 69 mod 8289164: Convert ResolutionErrorTable to use ResourceHashtable Reviewed-by: iklam, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/9337 From kvn at openjdk.org Thu Jul 7 18:40:42 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 7 Jul 2022 18:40:42 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: On Wed, 15 Jun 2022 09:30:59 GMT, Boris Ulasevich wrote: > Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. > > Cleanup work: apply CompLevel type where applicable. > > The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime > > Renaissance benchmarks shows no performance regressions on x86 and aarch. > > BEFORE: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > /* 8 | 4 */ const CompilerType _type; <<<< > /* 12 | 4 */ int _size; > /* 16 | 4 */ int _header_size; > /* 20 | 4 */ int _frame_complete_offset; > /* 24 | 4 */ int _data_offset; > /* 28 | 4 */ int _frame_size; > /* 32 | 8 */ address _code_begin; > /* 40 | 8 */ address _code_end; > /* 48 | 8 */ address _content_begin; > /* 56 | 8 */ address _data_end; > /* 64 | 8 */ address _relocation_begin; > /* 72 | 8 */ address _relocation_end; > /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 88 | 1 */ bool _caller_must_gc_arguments; > /* 89 | 1 */ bool _is_compiled; > /* XXX 6-byte hole */ > /* 96 | 8 */ const char *_name; > /* 104 | 8 */ class AsmRemarks { > /* 104 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 112 | 8 */ class DbgStrings { > /* 112 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 120 */ > } > > AFTER: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > protected: > /* 8 | 8 */ address _code_begin; > /* 16 | 8 */ address _code_end; > /* 24 | 8 */ address _content_begin; > /* 32 | 8 */ address _data_end; > /* 40 | 8 */ address _relocation_begin; > /* 48 | 8 */ address _relocation_end; > /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 64 | 8 */ const char *_name; > /* 72 | 4 */ int _size; > /* 76 | 4 */ int _header_size; > /* 80 | 4 */ int _frame_complete_offset; > /* 84 | 4 */ int _data_offset; > /* 88 | 4 */ int _frame_size; > /* 92 | 1 */ bool _caller_must_gc_arguments; > /* 93 | 1 */ bool _is_compiled; > /* 94 | 1 */ const CompilerType _type; <<<< > /* XXX 1-byte hole */ > /* 96 | 8 */ class AsmRemarks { > /* 96 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 104 | 8 */ class DbgStrings { > /* 104 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 112 */ > } > > BEFORE: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > private: > /* 208 | 4 */ int _entry_bci; > /* XXX 4-byte hole */ > /* 216 | 8 */ uint64_t _gc_epoch; > /* 224 | 8 */ nmethod *_osr_link; > /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 240 | 8 */ address _entry_point; > /* 248 | 8 */ address _verified_entry_point; > /* 256 | 8 */ address _osr_entry_point; > /* 264 | 4 */ int _exception_offset; > /* 268 | 4 */ int _unwind_handler_offset; > /* 272 | 4 */ int _consts_offset; > /* 276 | 4 */ int _stub_offset; > /* 280 | 4 */ int _oops_offset; > /* 284 | 4 */ int _metadata_offset; > /* 288 | 4 */ int _scopes_data_offset; > /* 292 | 4 */ int _scopes_pcs_offset; > /* 296 | 4 */ int _dependencies_offset; > /* 300 | 4 */ int _handler_table_offset; > /* 304 | 4 */ int _nul_chk_table_offset; > /* 308 | 4 */ int _speculations_offset; > /* 312 | 4 */ int _jvmci_data_offset; > /* 316 | 4 */ int _nmethod_end_offset; > /* 320 | 4 */ int _orig_pc_offset; > /* 324 | 4 */ int _compile_id; > /* 328 | 4 */ int _comp_level; <<<< > /* 332 | 1 */ bool _has_flushed_dependencies; > /* 333 | 1 */ bool _unload_reported; > /* 334 | 1 */ bool _load_reported; > /* 335 | 1 */ volatile signed char _state; > /* 336 | 1 */ bool _oops_are_stale; > /* XXX 3-byte hole */ > /* 340 | 4 */ RTMState _rtm_state; > /* 344 | 4 */ volatile jint _lock_count; > /* XXX 4-byte hole */ > /* 352 | 8 */ volatile int64_t _stack_traversal_mark; > /* 360 | 4 */ int _hotness_counter; > /* 364 | 1 */ volatile uint8_t _is_unloading_state; > /* XXX 3-byte hole */ > /* 368 | 4 */ ByteSize _native_receiver_sp_offset; > /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; > > /* total size (bytes): 376 */ > } > > AFTER: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; > /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 232 | 8 */ address _entry_point; > /* 240 | 8 */ address _verified_entry_point; > /* 248 | 8 */ address _osr_entry_point; > /* 256 | 4 */ int _entry_bci; > /* 260 | 4 */ int _exception_offset; > /* 264 | 4 */ int _unwind_handler_offset; > /* 268 | 4 */ int _consts_offset; > /* 272 | 4 */ int _stub_offset; > /* 276 | 4 */ int _oops_offset; > /* 280 | 4 */ int _metadata_offset; > /* 284 | 4 */ int _scopes_data_offset; > /* 288 | 4 */ int _scopes_pcs_offset; > /* 292 | 4 */ int _dependencies_offset; > /* 296 | 4 */ int _handler_table_offset; > /* 300 | 4 */ int _nul_chk_table_offset; > /* 304 | 4 */ int _speculations_offset; > /* 308 | 4 */ int _jvmci_data_offset; > /* 312 | 4 */ int _nmethod_end_offset; > /* 316 | 4 */ int _orig_pc_offset; > /* 320 | 4 */ int _compile_id; > /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; > /* 332 | 4 */ int _hotness_counter; > /* 336 | 4 */ ByteSize _native_receiver_sp_offset; > /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; > /* 344 | 1 */ CompLevel _comp_level; <<<< > /* 345 | 1 */ volatile uint8_t _is_unloading_state; > /* 346 | 1 */ bool _has_flushed_dependencies; > /* 347 | 1 */ bool _unload_reported; > /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; > /* 350 | 1 */ bool _oops_are_stale; > > /* total size (bytes): 352 */ > } What about CompiledMethod? Would be interesting to profile fields access. I assume hot fields should be in first cache line. Or it is not important? ------------- PR: https://git.openjdk.org/jdk/pull/9165 From jorn.vernee at oracle.com Thu Jul 7 18:22:25 2022 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Thu, 7 Jul 2022 20:22:25 +0200 Subject: Obsoleting JavaCritical In-Reply-To: References: <1c3e7789-f764-289e-dd0b-2f4f1b250acd@oracle.com> <04248465-fee4-20ba-c2a5-217d7867c6f4@oracle.com> <20220607103108.900830823@eggemoggin.niobe.net> <4857ff3a-eef5-d7ef-9cff-ff89441710a0@oracle.com> <4325a770-638d-e15e-d3f6-783a47181f31@oracle.com> <21506449-753B-4483-B10C-8C5991999BD8@oracle.com> Message-ID: <719e318a-a745-79cf-274c-81ccc1b32211@oracle.com> The Java->native transition is required to let the rest of the VM know it doesn't have to wait for this particular thread when waiting for all threads to get to a safepoint (global safepoint). Not doing the state transition means that, when a global safepoint is requested, the rest of the VM (including other threads running Java code) will block at their respective safepoints until the native call returns to Java and reaches a subsequent safepoint of it's own later in the code. If the native call takes a long time to complete, this would be detrimental to performance. The thing that's important to allow for upcalls is setting the "frame anchor" which describes the last Java frame, in a thread local. This frame anchor is saved as part of the "entry" frame when an upcall happens, and is used by stack walking code to "jump" over all the native frames in between an upcall and downcall , to continue walking Java frames on the other (downcall) side. Jorn On 06/07/2022 20:19, Ioannis Tsakpinis wrote: > Afaik, the Java->native transition is required because an upcall back > to Java during the JNI downcall would otherwise crash the JVM. Are > transitions required for any other reason? From aph at openjdk.org Thu Jul 7 16:13:50 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 7 Jul 2022 16:13:50 GMT Subject: RFR: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions [v8] In-Reply-To: References: Message-ID: On Fri, 17 Jun 2022 20:06:55 GMT, Dmitry Chuyko wrote: >> On AArch64 it is sometimes convenient to have LSE atomics right from the start. Currently they are enabled after feature detection and RR reverse debugger works incorrectly. >> >> New build configuration feature 'hardlse' is added. If it is enabled for aarch64 type of build, then statically compiled stubs replace the initial pessimistic implementation and dynamically generated replacements (when LSE support is detected). The feature works for builds of all debug levels. >> >> New file atomic_linux_aarch64_lse.S is derived from atomic_linux_aarch64.S and inherits its copyright. This alternative static implementation corresponds to the dynamically generated code. >> >> Note, this configuration part is necessary but not sufficient to fully avoid strex instructions for practical purposes. Other parts are: >> >> * Run on the OS built without strex family instructions. E.g. Amazon Linux 2022. >> * Compile with outline atomics enabled and the configuration flag enabled. E.g. configure with >> --with-extra-cflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-cxxflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-ldflags='-Wl,--allow-multiple-definition' --with-jvm-features=hardlse >> >> Testing: tier1, tier2 on linux-aarch64 release builds with feature off and feature on. > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Moved 2 prfm-s from wrong ifdef branch > - Removed unnecessary changes (forced UseLSE, blank lines) > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Use LSE in linux-aarch64 asm code if __ARM_FEATURE_ATOMICS is on > - Revert "hardlse feature" > > This reverts commit c5da85d3282bb995f69639f8f592cc94560916c5. > - Merge branch 'openjdk:master' into JDK-8282322 > - ... and 1 more: https://git.openjdk.org/jdk/compare/9c7a7277...d1ae97d9 Is this dead, or... ? ------------- PR: https://git.openjdk.org/jdk/pull/8779 From phh at openjdk.org Thu Jul 7 18:23:50 2022 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 7 Jul 2022 18:23:50 GMT Subject: RFR: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions [v8] In-Reply-To: References: Message-ID: On Fri, 17 Jun 2022 20:06:55 GMT, Dmitry Chuyko wrote: >> On AArch64 it is sometimes convenient to have LSE atomics right from the start. Currently they are enabled after feature detection and RR reverse debugger works incorrectly. >> >> New build configuration feature 'hardlse' is added. If it is enabled for aarch64 type of build, then statically compiled stubs replace the initial pessimistic implementation and dynamically generated replacements (when LSE support is detected). The feature works for builds of all debug levels. >> >> New file atomic_linux_aarch64_lse.S is derived from atomic_linux_aarch64.S and inherits its copyright. This alternative static implementation corresponds to the dynamically generated code. >> >> Note, this configuration part is necessary but not sufficient to fully avoid strex instructions for practical purposes. Other parts are: >> >> * Run on the OS built without strex family instructions. E.g. Amazon Linux 2022. >> * Compile with outline atomics enabled and the configuration flag enabled. E.g. configure with >> --with-extra-cflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-cxxflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-ldflags='-Wl,--allow-multiple-definition' --with-jvm-features=hardlse >> >> Testing: tier1, tier2 on linux-aarch64 release builds with feature off and feature on. > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Moved 2 prfm-s from wrong ifdef branch > - Removed unnecessary changes (forced UseLSE, blank lines) > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Use LSE in linux-aarch64 asm code if __ARM_FEATURE_ATOMICS is on > - Revert "hardlse feature" > > This reverts commit c5da85d3282bb995f69639f8f592cc94560916c5. > - Merge branch 'openjdk:master' into JDK-8282322 > - ... and 1 more: https://git.openjdk.org/jdk/compare/6181cfc9...d1ae97d9 Andrew, looks like Dima will be good to go once you re-review it. ------------- PR: https://git.openjdk.org/jdk/pull/8779 From iklam at openjdk.org Thu Jul 7 17:13:47 2022 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 7 Jul 2022 17:13:47 GMT Subject: RFR: 8278923: Document Klass::is_loader_alive [v2] In-Reply-To: <_dOmbbRg9ysGMQ2QgyeEdLuz8lYj3u90e9gzi-cvGQI=.f54df3de-48a7-4fc3-9ff6-19bda30f8213@github.com> References: <_dOmbbRg9ysGMQ2QgyeEdLuz8lYj3u90e9gzi-cvGQI=.f54df3de-48a7-4fc3-9ff6-19bda30f8213@github.com> Message-ID: On Thu, 7 Jul 2022 13:01:38 GMT, Coleen Phillimore wrote: >> This trivial change just adds a comment to Klass::is_loader_alive. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo. LGTM ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.org/jdk/pull/9400 From jwilhelm at openjdk.org Thu Jul 7 22:29:42 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Thu, 7 Jul 2022 22:29:42 GMT Subject: Withdrawn: Merge jdk19 In-Reply-To: <-KkCuVSAMiTg4BocRz3OwWxTQ7_0TIjXqqGQ4OicoOg=.c4255a3a-29b0-4fb5-9334-fe694249ad44@github.com> References: <-KkCuVSAMiTg4BocRz3OwWxTQ7_0TIjXqqGQ4OicoOg=.c4255a3a-29b0-4fb5-9334-fe694249ad44@github.com> Message-ID: On Thu, 7 Jul 2022 20:14:12 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 19 -> JDK 20 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9415 From coleenp at openjdk.org Thu Jul 7 20:27:44 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Jul 2022 20:27:44 GMT Subject: RFR: 8278923: Document Klass::is_loader_alive [v2] In-Reply-To: <_dOmbbRg9ysGMQ2QgyeEdLuz8lYj3u90e9gzi-cvGQI=.f54df3de-48a7-4fc3-9ff6-19bda30f8213@github.com> References: <_dOmbbRg9ysGMQ2QgyeEdLuz8lYj3u90e9gzi-cvGQI=.f54df3de-48a7-4fc3-9ff6-19bda30f8213@github.com> Message-ID: <2vNm_u5aG56ZWcCKQsZQgZlFU_lrnld6gI1TrCZ4QpU=.09768ab8-67e7-4bac-aa3f-ec2706c572da@github.com> On Thu, 7 Jul 2022 13:01:38 GMT, Coleen Phillimore wrote: >> This trivial change just adds a comment to Klass::is_loader_alive. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo. Thanks Ioi! ------------- PR: https://git.openjdk.org/jdk/pull/9400 From fgao at openjdk.org Fri Jul 8 01:48:41 2022 From: fgao at openjdk.org (Fei Gao) Date: Fri, 8 Jul 2022 01:48:41 GMT Subject: RFR: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 18:54:06 GMT, Vladimir Kozlov wrote: > In which call to `adjust_alignment_for_type_conversion()` you got AddP node? Should we add checks there too? Thanks for your review, @vnkozlov . When we called `adjust_alignment_for_type_conversion()` in `SuperWord::follow_def_uses()`, https://github.com/openjdk/jdk/blob/3f1174aa4709aabcfde8b40deec88b8ed466cc06/src/hotspot/share/opto/superword.cpp#L1525, we got AddP node. In this function, we also call `stmts_can_pack()` on the next line, which has checks to prevent unwanted pairs, https://github.com/openjdk/jdk/blob/3f1174aa4709aabcfde8b40deec88b8ed466cc06/src/hotspot/share/opto/superword.cpp#L1202. Maybe we don't have to add one more. WDYT? ------------- PR: https://git.openjdk.org/jdk/pull/9391 From dlong at openjdk.org Thu Jul 7 22:57:42 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Jul 2022 22:57:42 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> On Wed, 15 Jun 2022 09:30:59 GMT, Boris Ulasevich wrote: > Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. > > Cleanup work: apply CompLevel type where applicable. > > The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime > > Renaissance benchmarks shows no performance regressions on x86 and aarch. > > BEFORE: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > /* 8 | 4 */ const CompilerType _type; <<<< > /* 12 | 4 */ int _size; > /* 16 | 4 */ int _header_size; > /* 20 | 4 */ int _frame_complete_offset; > /* 24 | 4 */ int _data_offset; > /* 28 | 4 */ int _frame_size; > /* 32 | 8 */ address _code_begin; > /* 40 | 8 */ address _code_end; > /* 48 | 8 */ address _content_begin; > /* 56 | 8 */ address _data_end; > /* 64 | 8 */ address _relocation_begin; > /* 72 | 8 */ address _relocation_end; > /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 88 | 1 */ bool _caller_must_gc_arguments; > /* 89 | 1 */ bool _is_compiled; > /* XXX 6-byte hole */ > /* 96 | 8 */ const char *_name; > /* 104 | 8 */ class AsmRemarks { > /* 104 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 112 | 8 */ class DbgStrings { > /* 112 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 120 */ > } > > AFTER: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > protected: > /* 8 | 8 */ address _code_begin; > /* 16 | 8 */ address _code_end; > /* 24 | 8 */ address _content_begin; > /* 32 | 8 */ address _data_end; > /* 40 | 8 */ address _relocation_begin; > /* 48 | 8 */ address _relocation_end; > /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 64 | 8 */ const char *_name; > /* 72 | 4 */ int _size; > /* 76 | 4 */ int _header_size; > /* 80 | 4 */ int _frame_complete_offset; > /* 84 | 4 */ int _data_offset; > /* 88 | 4 */ int _frame_size; > /* 92 | 1 */ bool _caller_must_gc_arguments; > /* 93 | 1 */ bool _is_compiled; > /* 94 | 1 */ const CompilerType _type; <<<< > /* XXX 1-byte hole */ > /* 96 | 8 */ class AsmRemarks { > /* 96 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 104 | 8 */ class DbgStrings { > /* 104 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 112 */ > } > > BEFORE: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > private: > /* 208 | 4 */ int _entry_bci; > /* XXX 4-byte hole */ > /* 216 | 8 */ uint64_t _gc_epoch; > /* 224 | 8 */ nmethod *_osr_link; > /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 240 | 8 */ address _entry_point; > /* 248 | 8 */ address _verified_entry_point; > /* 256 | 8 */ address _osr_entry_point; > /* 264 | 4 */ int _exception_offset; > /* 268 | 4 */ int _unwind_handler_offset; > /* 272 | 4 */ int _consts_offset; > /* 276 | 4 */ int _stub_offset; > /* 280 | 4 */ int _oops_offset; > /* 284 | 4 */ int _metadata_offset; > /* 288 | 4 */ int _scopes_data_offset; > /* 292 | 4 */ int _scopes_pcs_offset; > /* 296 | 4 */ int _dependencies_offset; > /* 300 | 4 */ int _handler_table_offset; > /* 304 | 4 */ int _nul_chk_table_offset; > /* 308 | 4 */ int _speculations_offset; > /* 312 | 4 */ int _jvmci_data_offset; > /* 316 | 4 */ int _nmethod_end_offset; > /* 320 | 4 */ int _orig_pc_offset; > /* 324 | 4 */ int _compile_id; > /* 328 | 4 */ int _comp_level; <<<< > /* 332 | 1 */ bool _has_flushed_dependencies; > /* 333 | 1 */ bool _unload_reported; > /* 334 | 1 */ bool _load_reported; > /* 335 | 1 */ volatile signed char _state; > /* 336 | 1 */ bool _oops_are_stale; > /* XXX 3-byte hole */ > /* 340 | 4 */ RTMState _rtm_state; > /* 344 | 4 */ volatile jint _lock_count; > /* XXX 4-byte hole */ > /* 352 | 8 */ volatile int64_t _stack_traversal_mark; > /* 360 | 4 */ int _hotness_counter; > /* 364 | 1 */ volatile uint8_t _is_unloading_state; > /* XXX 3-byte hole */ > /* 368 | 4 */ ByteSize _native_receiver_sp_offset; > /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; > > /* total size (bytes): 376 */ > } > > AFTER: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; > /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 232 | 8 */ address _entry_point; > /* 240 | 8 */ address _verified_entry_point; > /* 248 | 8 */ address _osr_entry_point; > /* 256 | 4 */ int _entry_bci; > /* 260 | 4 */ int _exception_offset; > /* 264 | 4 */ int _unwind_handler_offset; > /* 268 | 4 */ int _consts_offset; > /* 272 | 4 */ int _stub_offset; > /* 276 | 4 */ int _oops_offset; > /* 280 | 4 */ int _metadata_offset; > /* 284 | 4 */ int _scopes_data_offset; > /* 288 | 4 */ int _scopes_pcs_offset; > /* 292 | 4 */ int _dependencies_offset; > /* 296 | 4 */ int _handler_table_offset; > /* 300 | 4 */ int _nul_chk_table_offset; > /* 304 | 4 */ int _speculations_offset; > /* 308 | 4 */ int _jvmci_data_offset; > /* 312 | 4 */ int _nmethod_end_offset; > /* 316 | 4 */ int _orig_pc_offset; > /* 320 | 4 */ int _compile_id; > /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; > /* 332 | 4 */ int _hotness_counter; > /* 336 | 4 */ ByteSize _native_receiver_sp_offset; > /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; > /* 344 | 1 */ CompLevel _comp_level; <<<< > /* 345 | 1 */ volatile uint8_t _is_unloading_state; > /* 346 | 1 */ bool _has_flushed_dependencies; > /* 347 | 1 */ bool _unload_reported; > /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; > /* 350 | 1 */ bool _oops_are_stale; > > /* total size (bytes): 352 */ > } Most of the files changed are because of CompLevel. It feels a little disruptive. I'd rather do the minimal changes. There is also a lot of unnecessary space used by these addresses: address _code_begin; address _code_end; address _content_begin; address _data_end; address _relocation_begin; address _relocation_end; Now that AOT has been removed, we could go back to 3 int fields like in jdk8. src/hotspot/share/code/nmethod.hpp line 267: > 265: ByteSize _native_basic_lock_sp_offset; > 266: > 267: CompLevel _comp_level; // compilation level To minimize changes in other files, how about just making this: int8_t _comp_level; ------------- PR: https://git.openjdk.org/jdk/pull/9165 From lmesnik at openjdk.org Fri Jul 8 00:53:43 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 8 Jul 2022 00:53:43 GMT Subject: RFR: 8271707: migrate tests to use jdk.test.whitebox.WhiteBox In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 20:43:09 GMT, Coleen Phillimore wrote: > This change uses sed to change sun.hotspot.WhiteBox to jdk.test.whitebox.Whitebox, and sun/hotspot/Whitebox similarly. Due to indirect inclusions of some of the test libraries, changing a few wasn't a reliable option, and I need the new one for a different change I was looking at. > The non-sed changes are for jdk/test/whitebox/WhiteBox to add some code for GC that was only added to the sun version. > Also, the ClassFileInstaller has a label for sun.hotspot.Whitebox so that didn't change with the edit. > Tested with tiers1-6. Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9417 From duke at openjdk.org Fri Jul 8 03:10:59 2022 From: duke at openjdk.org (duke) Date: Fri, 8 Jul 2022 03:10:59 GMT Subject: Withdrawn: 8283232: x86: Improve vector broadcast operations In-Reply-To: References: Message-ID: <3kVB6o7RASf4cTtldfYCrl8g2zufvlljCyLYqhbT-Yg=.53fd3656-9e2e-4bf5-b6a2-b28477c6e0e8@github.com> On Wed, 16 Mar 2022 01:19:24 GMT, Quan Anh Mai wrote: > Hi, > > This patch improves the generation of broadcasting a scalar in several ways: > > - Avoid potential data bypass delay which can be observed on some platforms by using the correct type of instruction if it does not require extra instructions. > - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. > - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. > > This patch also removes some redundant code paths and rename some incorrectly named instructions. > > Thank you very much. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From stuefe at openjdk.org Fri Jul 8 04:30:41 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 8 Jul 2022 04:30:41 GMT Subject: RFR: 8289778: ZGC: incorrect use of os::free() for mountpoint string handling after JDK-8289633 [v3] In-Reply-To: References: <7tpo8aJmpD1bL_hofv3qd0n6iMeJdxX2so6H2rbWOg8=.01e5458d-8806-407b-868e-929d2c74085d@github.com> Message-ID: <5a7TS4JiwsJszSkINGK1JoKvf1YCFETcs77Mai7a1MU=.ae29fd13-11c0-4270-b523-214ce6b5e3e7@github.com> On Thu, 7 Jul 2022 13:17:40 GMT, Martin Doerr wrote: >>> Did you add this by intention or was it added by your IDE? >> >> It was added by intention. >> Any question? >> Thanks. > > No, that's fine. My VS code often adds it, but it's not strictly required. Needed for Kim's ALLOW_xxx macros. ------------- PR: https://git.openjdk.org/jdk/pull/9387 From jwilhelm at openjdk.org Fri Jul 8 02:11:29 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Fri, 8 Jul 2022 02:11:29 GMT Subject: Integrated: Merge jdk19 In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 22:31:27 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 19 -> JDK 20 This pull request has now been integrated. Changeset: 01b9f95c Author: Jesper Wilhelmsson URL: https://git.openjdk.org/jdk/commit/01b9f95c62953e7f9ca10eafd42d21c634413827 Stats: 807 lines in 28 files changed: 669 ins; 52 del; 86 mod Merge ------------- PR: https://git.openjdk.org/jdk/pull/9419 From jwilhelm at openjdk.org Thu Jul 7 20:22:53 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Thu, 7 Jul 2022 20:22:53 GMT Subject: RFR: Merge jdk19 Message-ID: <-KkCuVSAMiTg4BocRz3OwWxTQ7_0TIjXqqGQ4OicoOg=.c4255a3a-29b0-4fb5-9334-fe694249ad44@github.com> Forwardport JDK 19 -> JDK 20 ------------- Commit messages: - Merge - 8289486: Improve XSLT XPath operators count efficiency - 8289779: Map::replaceAll javadoc has redundant @throws clauses - 8289558: Need spec clarification of j.l.foreign.*Layout - 8289196: Pattern domination not working properly for record patterns - 6509045: {@inheritDoc} only copies one instance of the specified exception - 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing - 8289857: ProblemList jdk/jfr/event/runtime/TestActiveSettingEvent.java - 8289840: ProblemList vmTestbase/nsk/jdwp/ThreadReference/ForceEarlyReturn/forceEarlyReturn002/forceEarlyReturn002.java when run with vthread wrapper - 8289841: ProblemList vmTestbase/gc/gctests/MemoryEaterMT/MemoryEaterMT.java with ZGC on windows The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=jdk&pr=9415&range=00.0 - jdk19: https://webrevs.openjdk.org/?repo=jdk&pr=9415&range=00.1 Changes: https://git.openjdk.org/jdk/pull/9415/files Stats: 803 lines in 28 files changed: 669 ins; 48 del; 86 mod Patch: https://git.openjdk.org/jdk/pull/9415.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9415/head:pull/9415 PR: https://git.openjdk.org/jdk/pull/9415 From dchuyko at openjdk.org Thu Jul 7 21:34:33 2022 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 7 Jul 2022 21:34:33 GMT Subject: RFR: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions [v8] In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 16:09:57 GMT, Andrew Haley wrote: >> Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8282322 >> - Merge branch 'openjdk:master' into JDK-8282322 >> - Merge branch 'openjdk:master' into JDK-8282322 >> - Moved 2 prfm-s from wrong ifdef branch >> - Removed unnecessary changes (forced UseLSE, blank lines) >> - Merge branch 'openjdk:master' into JDK-8282322 >> - Merge branch 'openjdk:master' into JDK-8282322 >> - Use LSE in linux-aarch64 asm code if __ARM_FEATURE_ATOMICS is on >> - Revert "hardlse feature" >> >> This reverts commit c5da85d3282bb995f69639f8f592cc94560916c5. >> - Merge branch 'openjdk:master' into JDK-8282322 >> - ... and 1 more: https://git.openjdk.org/jdk/compare/ffee357a...d1ae97d9 > > Is this dead, or... ? @theRealAph I have made the latest edits as per your comments and they are pending review. ------------- PR: https://git.openjdk.org/jdk/pull/8779 From duke at openjdk.org Thu Jul 7 21:37:36 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Thu, 7 Jul 2022 21:37:36 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines in C2 [v2] In-Reply-To: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: > A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. > > Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Rename variables ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9405/files - new: https://git.openjdk.org/jdk/pull/9405/files/8e0f473a..0c225a66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=00-01 Stats: 20 lines in 4 files changed: 1 ins; 1 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/9405.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9405/head:pull/9405 PR: https://git.openjdk.org/jdk/pull/9405 From dholmes at openjdk.org Fri Jul 8 00:08:07 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 8 Jul 2022 00:08:07 GMT Subject: RFR: 8289710: Move Suspend/Resume classes out of os.hpp [v2] In-Reply-To: References: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Message-ID: On Tue, 5 Jul 2022 16:47:27 GMT, Ioi Lam wrote: >> Please review this simple change that only renames a few classes and moved some code around. No functional changes. >> >> The following classes are used only sparingly. They should be moved to a new header file share/runtime/suspend.hpp to minimize the size of os.hpp >> >> - SuspendedThreadTaskContext >> - SuspendedThreadTask >> - SuspendResume >> >> I didn't move the OS-specific implementation to a new file -- the POSIX implementation is currently inside [signals_posix.cpp](https://github.com/openjdk/jdk/blob/df063f7db18a40ea7325fe608b3206a6dff812c1/src/hotspot/os/posix/signals_posix.cpp#L1790) mixed with other signal handling code, so it doesn't seem a good idea to move out just the code for the above 3 classes. >> >> The only other implementation is in os_windows.cpp. I could move the code to suspend_windows.cpp, but I don't feel very motivated unless someone insists. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > moved SuspendResume class to os/posix directory Looks good. Thanks. src/hotspot/share/runtime/suspendedThreadTask.cpp line 26: > 24: > 25: #include "precompiled.hpp" > 26: #include "runtime/atomic.hpp" I don't think this is needed. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9371 From dlong at openjdk.org Thu Jul 7 22:50:28 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Jul 2022 22:50:28 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: On Fri, 24 Jun 2022 09:00:26 GMT, Boris Ulasevich wrote: >> src/hotspot/share/compiler/compilerDefinitions.hpp line 57: >> >>> 55: >>> 56: // Enumeration to distinguish tiers of compilation >>> 57: enum CompLevel : s1 { >> >> Hope it won't cause gcc to generate inefficient code manupulating bytes. > > AARCH and AMD have load byte instructions (ldr:ldrb, mov:movzx), I believe method::comp_level() code takes the same number of instructions before/after the change. This enum change might be hard to back-port to jdk11, which still uses an older toolchain, at least for Oracle builds. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From coleenp at openjdk.org Thu Jul 7 23:30:25 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Jul 2022 23:30:25 GMT Subject: RFR: 8271707: migrate tests to use jdk.test.whitebox.WhiteBox Message-ID: This change uses sed to change sun.hotspot.WhiteBox to jdk.test.whitebox.Whitebox, and sun/hotspot/Whitebox similarly. Due to indirect inclusions of some of the test libraries, changing a few wasn't a reliable option, and I need the new one for a different change I was looking at. The non-sed changes are for jdk/test/whitebox/WhiteBox to add some code for GC that was only added to the sun version. Also, the ClassFileInstaller has a label for sun.hotspot.Whitebox so that didn't change with the edit. Tested with tiers1-6. ------------- Commit messages: - 8271707: migrate tests to use jdk.test.whitebox.WhiteBox Changes: https://git.openjdk.org/jdk/pull/9417/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9417&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8271707 Stats: 2995 lines in 984 files changed: 6 ins; 0 del; 2989 mod Patch: https://git.openjdk.org/jdk/pull/9417.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9417/head:pull/9417 PR: https://git.openjdk.org/jdk/pull/9417 From coleenp at openjdk.org Thu Jul 7 20:32:41 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Jul 2022 20:32:41 GMT Subject: Integrated: 8278923: Document Klass::is_loader_alive In-Reply-To: References: Message-ID: <6LxHHoQBnAbE7Y8aLRlDQFGG-YOG3dikGR1phf1BlCw=.04bf2731-9d40-4033-8853-416639f93cc6@github.com> On Wed, 6 Jul 2022 15:07:37 GMT, Coleen Phillimore wrote: > This trivial change just adds a comment to Klass::is_loader_alive. This pull request has now been integrated. Changeset: 8cdead0c Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/8cdead0c94094a025c48eaefc7a3ef0c36a9629e Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8278923: Document Klass::is_loader_alive Reviewed-by: dholmes, iklam ------------- PR: https://git.openjdk.org/jdk/pull/9400 From iklam at openjdk.org Fri Jul 8 04:29:36 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 04:29:36 GMT Subject: RFR: 8289710: Move Suspend/Resume classes out of os.hpp [v3] In-Reply-To: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> References: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Message-ID: > Please review this simple change that only renames a few classes and moved some code around. No functional changes. > > The following classes are used only sparingly. They should be moved to a new header file share/runtime/suspend.hpp to minimize the size of os.hpp > > - SuspendedThreadTaskContext > - SuspendedThreadTask > - SuspendResume > > I didn't move the OS-specific implementation to a new file -- the POSIX implementation is currently inside [signals_posix.cpp](https://github.com/openjdk/jdk/blob/df063f7db18a40ea7325fe608b3206a6dff812c1/src/hotspot/os/posix/signals_posix.cpp#L1790) mixed with other signal handling code, so it doesn't seem a good idea to move out just the code for the above 3 classes. > > The only other implementation is in os_windows.cpp. I could move the code to suspend_windows.cpp, but I don't feel very motivated unless someone insists. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8289710-move-suspend-classes-out-of-os-hpp - moved SuspendResume class to os/posix directory - 8289710: Move Suspend/Resume classes out of os.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9371/files - new: https://git.openjdk.org/jdk/pull/9371/files/8265ecb2..46ab65b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9371&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9371&range=01-02 Stats: 12133 lines in 385 files changed: 5838 ins; 2311 del; 3984 mod Patch: https://git.openjdk.org/jdk/pull/9371.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9371/head:pull/9371 PR: https://git.openjdk.org/jdk/pull/9371 From duke at openjdk.org Fri Jul 8 00:15:20 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Fri, 8 Jul 2022 00:15:20 GMT Subject: RFR: 8263377: Store method handle linkers in the 'non-nmethods' heap [v4] In-Reply-To: References: Message-ID: > 8263377: Store method handle linkers in the 'non-nmethods' heap Yi-Fan Tsai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - Post dynamic_code_generate event when MH intrinsic generated - Remove dead codes remove unused argument of NativeJump::check_verified_entry_alignment remove unused argument of NativeJumip::patch_verified_entry remove dead codes in SharedRuntime::generate_method_handle_intrinsic_wrapper - Add PrintCodeCache support - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - Move to RuntimeBlob - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - Move MHI to BufferBlob - Change _code to CodeBlob - ... and 8 more: https://git.openjdk.org/jdk/compare/35156041...d92b8647 ------------- Changes: https://git.openjdk.org/jdk/pull/8760/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=8760&range=03 Stats: 588 lines in 58 files changed: 279 ins; 176 del; 133 mod Patch: https://git.openjdk.org/jdk/pull/8760.diff Fetch: git fetch https://git.openjdk.org/jdk pull/8760/head:pull/8760 PR: https://git.openjdk.org/jdk/pull/8760 From jwilhelm at openjdk.org Thu Jul 7 22:40:28 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Thu, 7 Jul 2022 22:40:28 GMT Subject: RFR: Merge jdk19 Message-ID: Forwardport JDK 19 -> JDK 20 ------------- Commit messages: - Merge - 8289486: Improve XSLT XPath operators count efficiency - 8289779: Map::replaceAll javadoc has redundant @throws clauses - 8289558: Need spec clarification of j.l.foreign.*Layout - 8289196: Pattern domination not working properly for record patterns - 6509045: {@inheritDoc} only copies one instance of the specified exception - 8288949: serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java failing - 8289857: ProblemList jdk/jfr/event/runtime/TestActiveSettingEvent.java - 8289840: ProblemList vmTestbase/nsk/jdwp/ThreadReference/ForceEarlyReturn/forceEarlyReturn002/forceEarlyReturn002.java when run with vthread wrapper - 8289841: ProblemList vmTestbase/gc/gctests/MemoryEaterMT/MemoryEaterMT.java with ZGC on windows The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=jdk&pr=9419&range=00.0 - jdk19: https://webrevs.openjdk.org/?repo=jdk&pr=9419&range=00.1 Changes: https://git.openjdk.org/jdk/pull/9419/files Stats: 807 lines in 28 files changed: 669 ins; 52 del; 86 mod Patch: https://git.openjdk.org/jdk/pull/9419.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9419/head:pull/9419 PR: https://git.openjdk.org/jdk/pull/9419 From jwilhelm at openjdk.org Thu Jul 7 22:27:22 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Thu, 7 Jul 2022 22:27:22 GMT Subject: RFR: Merge jdk19 [v2] In-Reply-To: <-KkCuVSAMiTg4BocRz3OwWxTQ7_0TIjXqqGQ4OicoOg=.c4255a3a-29b0-4fb5-9334-fe694249ad44@github.com> References: <-KkCuVSAMiTg4BocRz3OwWxTQ7_0TIjXqqGQ4OicoOg=.c4255a3a-29b0-4fb5-9334-fe694249ad44@github.com> Message-ID: <2HegrqiJuBLT64CHm3G3bNnBtYc0U07ECeoRV1pKbWA=.3c35644c-6a60-48fe-b0cc-d0e41c0c9ee1@github.com> > Forwardport JDK 19 -> JDK 20 Jesper Wilhelmsson has updated the pull request incrementally with one additional commit since the last revision: Fix merge error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9415/files - new: https://git.openjdk.org/jdk/pull/9415/files/0f86db4f..c6949b8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9415&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9415&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9415.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9415/head:pull/9415 PR: https://git.openjdk.org/jdk/pull/9415 From aph at openjdk.org Fri Jul 8 10:07:42 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 8 Jul 2022 10:07:42 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:28:06 GMT, Andrew Haley wrote: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. OK, thanks. Any clue about a reproducer? I see it's G1, and involves Unsafe, but these might not be requirements. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From iklam at openjdk.org Fri Jul 8 05:42:32 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 05:42:32 GMT Subject: Integrated: 8289710: Move Suspend/Resume classes out of os.hpp In-Reply-To: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> References: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Message-ID: On Mon, 4 Jul 2022 23:07:27 GMT, Ioi Lam wrote: > Please review this simple change that only renames a few classes and moved some code around. No functional changes. > > The following classes are used only sparingly. They should be moved to a new header file share/runtime/suspend.hpp to minimize the size of os.hpp > > - SuspendedThreadTaskContext > - SuspendedThreadTask > - SuspendResume > > I didn't move the OS-specific implementation to a new file -- the POSIX implementation is currently inside [signals_posix.cpp](https://github.com/openjdk/jdk/blob/df063f7db18a40ea7325fe608b3206a6dff812c1/src/hotspot/os/posix/signals_posix.cpp#L1790) mixed with other signal handling code, so it doesn't seem a good idea to move out just the code for the above 3 classes. > > The only other implementation is in os_windows.cpp. I could move the code to suspend_windows.cpp, but I don't feel very motivated unless someone insists. This pull request has now been integrated. Changeset: 1fec62f2 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/1fec62f299294a0c3b3c639883cdcdc8f1410224 Stats: 379 lines in 14 files changed: 229 ins; 114 del; 36 mod 8289710: Move Suspend/Resume classes out of os.hpp Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/9371 From stuefe at openjdk.org Fri Jul 8 08:16:08 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 8 Jul 2022 08:16:08 GMT Subject: [jdk19] Integrated: 8289799: Build warning in methodData.cpp memset zero-length parameter In-Reply-To: References: Message-ID: <2TbLWR7MvmoUZvgEgltwyYvl3UydThcmGzsMOS7U5eU=.90325337-e725-473f-a727-d25d0e35c671@github.com> On Thu, 7 Jul 2022 13:50:36 GMT, Thomas Stuefe wrote: > Trival clean backport, prevents gcc warnings on Fedore 12 GCC 8.3 > > The commit being backported was authored by Thomas Stuefe on 7 Jul 2022 and was reviewed by Jie Fu and Lutz Schmidt. > > Thanks! This pull request has now been integrated. Changeset: ea21c465 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk19/commit/ea21c46531e8095c12153f787a24715eb8efbb03 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8289799: Build warning in methodData.cpp memset zero-length parameter Backport-of: cce77a700141a854bafaa5ccb33db026affcf322 ------------- PR: https://git.openjdk.org/jdk19/pull/119 From rehn at openjdk.org Fri Jul 8 06:58:47 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Jul 2022 06:58:47 GMT Subject: RFR: 8286957: Held monitor count [v8] In-Reply-To: References: Message-ID: <0GNzQNr0pL3jaXc2rKj6or-SaRdq4EJqHJ1bpeSuch4=.04c57f6d-dddc-47b3-9132-56c3f9a38bab@github.com> On Thu, 7 Jul 2022 19:09:42 GMT, Erik ?sterlund wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed strw, zero rename and made methods return 64 bit counter in all cases > > I could not find any error. Looks good to me. @fisk thank you! @pron thank you! It passed loom t1-5 also. ------------- PR: https://git.openjdk.org/jdk/pull/8945 From dchuyko at openjdk.org Fri Jul 8 08:59:30 2022 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 8 Jul 2022 08:59:30 GMT Subject: RFR: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions [v8] In-Reply-To: References: Message-ID: On Fri, 17 Jun 2022 20:06:55 GMT, Dmitry Chuyko wrote: >> On AArch64 it is sometimes convenient to have LSE atomics right from the start. Currently they are enabled after feature detection and RR reverse debugger works incorrectly. >> >> New build configuration feature 'hardlse' is added. If it is enabled for aarch64 type of build, then statically compiled stubs replace the initial pessimistic implementation and dynamically generated replacements (when LSE support is detected). The feature works for builds of all debug levels. >> >> New file atomic_linux_aarch64_lse.S is derived from atomic_linux_aarch64.S and inherits its copyright. This alternative static implementation corresponds to the dynamically generated code. >> >> Note, this configuration part is necessary but not sufficient to fully avoid strex instructions for practical purposes. Other parts are: >> >> * Run on the OS built without strex family instructions. E.g. Amazon Linux 2022. >> * Compile with outline atomics enabled and the configuration flag enabled. E.g. configure with >> --with-extra-cflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-cxxflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-ldflags='-Wl,--allow-multiple-definition' --with-jvm-features=hardlse >> >> Testing: tier1, tier2 on linux-aarch64 release builds with feature off and feature on. > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Moved 2 prfm-s from wrong ifdef branch > - Removed unnecessary changes (forced UseLSE, blank lines) > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Use LSE in linux-aarch64 asm code if __ARM_FEATURE_ATOMICS is on > - Revert "hardlse feature" > > This reverts commit c5da85d3282bb995f69639f8f592cc94560916c5. > - Merge branch 'openjdk:master' into JDK-8282322 > - ... and 1 more: https://git.openjdk.org/jdk/compare/d69d43b0...d1ae97d9 Andrew, thanks for taking a look. This change is now for the master, later we can also consider update releases. ------------- PR: https://git.openjdk.org/jdk/pull/8779 From dholmes at openjdk.org Fri Jul 8 06:15:42 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 8 Jul 2022 06:15:42 GMT Subject: RFR: 8271707: migrate tests to use jdk.test.whitebox.WhiteBox In-Reply-To: References: Message-ID: <-l-lc8D0eUr1WpTplIgI8Zz2VDpvUEA_LCkxovQueek=.53c06318-45d5-4637-8fe4-adfcea65b121@github.com> On Thu, 7 Jul 2022 20:43:09 GMT, Coleen Phillimore wrote: > This change uses sed to change sun.hotspot.WhiteBox to jdk.test.whitebox.Whitebox, and sun/hotspot/Whitebox similarly. Due to indirect inclusions of some of the test libraries, changing a few wasn't a reliable option, and I need the new one for a different change I was looking at. > The non-sed changes are for jdk/test/whitebox/WhiteBox to add some code for GC that was only added to the sun version. > Also, the ClassFileInstaller has a label for sun.hotspot.Whitebox so that didn't change with the edit. > Tested with tiers1-6. I skimmed the diff and this seems fine. Are we not going to remove the old WhiteBox at the same time? Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9417 From tschatzl at openjdk.org Fri Jul 8 08:51:42 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Jul 2022 08:51:42 GMT Subject: RFR: 8289137: Automatically adapt Young/OldPLABSize and when setting only MinTLABSize [v2] In-Reply-To: References: Message-ID: > Hi all, > > can I get reviews for this enhancement fixing a sometimes annoying UI issue where setting `-XX:MinTLABSize` does not automatically update `-XX:YoungPLABSize` and `-XX:OldPLABSize` if they are not set? > This avoids some unnecessary retries. > > Testing: gha, test case > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Fix minimal vm compilation? ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9425/files - new: https://git.openjdk.org/jdk/pull/9425/files/e59d06d9..7d7382bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9425&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9425&range=00-01 Stats: 3 lines in 2 files changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9425.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9425/head:pull/9425 PR: https://git.openjdk.org/jdk/pull/9425 From dholmes at openjdk.org Fri Jul 8 06:58:46 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 8 Jul 2022 06:58:46 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:28:06 GMT, Andrew Haley wrote: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. To be on the safe side I'm putting this through our internal testing. Please hold off integrating until I give it the green light. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Fri Jul 8 07:14:58 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 8 Jul 2022 07:14:58 GMT Subject: RFR: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions [v8] In-Reply-To: References: Message-ID: On Fri, 17 Jun 2022 20:06:55 GMT, Dmitry Chuyko wrote: >> On AArch64 it is sometimes convenient to have LSE atomics right from the start. Currently they are enabled after feature detection and RR reverse debugger works incorrectly. >> >> New build configuration feature 'hardlse' is added. If it is enabled for aarch64 type of build, then statically compiled stubs replace the initial pessimistic implementation and dynamically generated replacements (when LSE support is detected). The feature works for builds of all debug levels. >> >> New file atomic_linux_aarch64_lse.S is derived from atomic_linux_aarch64.S and inherits its copyright. This alternative static implementation corresponds to the dynamically generated code. >> >> Note, this configuration part is necessary but not sufficient to fully avoid strex instructions for practical purposes. Other parts are: >> >> * Run on the OS built without strex family instructions. E.g. Amazon Linux 2022. >> * Compile with outline atomics enabled and the configuration flag enabled. E.g. configure with >> --with-extra-cflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-cxxflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-ldflags='-Wl,--allow-multiple-definition' --with-jvm-features=hardlse >> >> Testing: tier1, tier2 on linux-aarch64 release builds with feature off and feature on. > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Moved 2 prfm-s from wrong ifdef branch > - Removed unnecessary changes (forced UseLSE, blank lines) > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Use LSE in linux-aarch64 asm code if __ARM_FEATURE_ATOMICS is on > - Revert "hardlse feature" > > This reverts commit c5da85d3282bb995f69639f8f592cc94560916c5. > - Merge branch 'openjdk:master' into JDK-8282322 > - ... and 1 more: https://git.openjdk.org/jdk/compare/c32c62f5...d1ae97d9 Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/8779 From rehn at openjdk.org Fri Jul 8 07:15:02 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Jul 2022 07:15:02 GMT Subject: Integrated: 8286957: Held monitor count In-Reply-To: References: Message-ID: On Mon, 30 May 2022 11:04:11 GMT, Robbin Ehn wrote: > The current implementation do not count all monitor enter, counts high up in abstraction and causes a performance regression on aarch64 with some benchmarks due to C2 changes. > > This change makes the counting exact by pushing the counting down in the abstraction. > The additional JNI counter is strictly not needed, but enables us to figure out if we have monitors "on stack". > > An uncontended lock plus unlock is 1 ns (21.5 -> 22.5) slower in C2 compiled code on x64 with the additional increment and decrement. > > Fixed aarch64, x64, x86 and zero. > > Passes t1-8 This pull request has now been integrated. Changeset: ac399e97 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/ac399e9777731e7a9cbc2ad3396acfa5358b1c76 Stats: 562 lines in 44 files changed: 324 ins; 155 del; 83 mod 8286957: Held monitor count Reviewed-by: rpressler, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/8945 From coleenp at openjdk.org Fri Jul 8 12:51:42 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Jul 2022 12:51:42 GMT Subject: RFR: 8271707: migrate tests to use jdk.test.whitebox.WhiteBox In-Reply-To: <-l-lc8D0eUr1WpTplIgI8Zz2VDpvUEA_LCkxovQueek=.53c06318-45d5-4637-8fe4-adfcea65b121@github.com> References: <-l-lc8D0eUr1WpTplIgI8Zz2VDpvUEA_LCkxovQueek=.53c06318-45d5-4637-8fe4-adfcea65b121@github.com> Message-ID: On Fri, 8 Jul 2022 06:12:18 GMT, David Holmes wrote: >> This change uses sed to change sun.hotspot.WhiteBox to jdk.test.whitebox.Whitebox, and sun/hotspot/Whitebox similarly. Due to indirect inclusions of some of the test libraries, changing a few wasn't a reliable option, and I need the new one for a different change I was looking at. >> The non-sed changes are for jdk/test/whitebox/WhiteBox to add some code for GC that was only added to the sun version. >> Also, the ClassFileInstaller has a label for sun.hotspot.Whitebox so that didn't change with the edit. >> Tested with tiers1-6. > > I skimmed the diff and this seems fine. > > Are we not going to remove the old WhiteBox at the same time? > > Thanks. @dholmes-ora We'll remove it with this patch JDK-8275662 and the other obsolete sun.hotspot test classes which will be easier to look at. Thanks for reviewing. ------------- PR: https://git.openjdk.org/jdk/pull/9417 From bulasevich at openjdk.org Fri Jul 8 13:19:44 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 8 Jul 2022 13:19:44 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: <2JYSVXeZkXo-M94Kpk0Det5C6iqCNNpDyLeISH32zk4=.c5ad058e-cae5-4430-99f0-d312c2d5074d@github.com> On Thu, 7 Jul 2022 15:05:14 GMT, R?mi Forax wrote: > It's also a good idea to have 64 bytes in between volatiles so they are not in the same cache-line Actually, there is not much space in nmethod. Can you suggest a better layout?: /* offset | size */ type = class nmethod : public CompiledMethod { /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; /* 232 | 8 */ address _entry_point; /* 240 | 8 */ address _verified_entry_point; /* 248 | 8 */ address _osr_entry_point; /* 256 | 4 */ int _entry_bci; /* 260 | 4 */ int _exception_offset; /* 264 | 4 */ int _unwind_handler_offset; /* 268 | 4 */ int _consts_offset; /* 272 | 4 */ int _stub_offset; /* 276 | 4 */ int _oops_offset; /* 280 | 4 */ int _metadata_offset; /* 284 | 4 */ int _scopes_data_offset; /* 288 | 4 */ int _scopes_pcs_offset; /* 292 | 4 */ int _dependencies_offset; /* 296 | 4 */ int _handler_table_offset; /* 300 | 4 */ int _nul_chk_table_offset; /* 304 | 4 */ int _speculations_offset; /* 308 | 4 */ int _jvmci_data_offset; /* 312 | 4 */ int _nmethod_end_offset; /* 316 | 4 */ int _orig_pc_offset; /* 320 | 4 */ int _compile_id; /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; /* 332 | 4 */ int _hotness_counter; /* 336 | 4 */ ByteSize _native_receiver_sp_offset; /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; /* 344 | 1 */ CompLevel _comp_level; > /* 345 | 1 */ volatile uint8_t _is_unloading_state; /* 346 | 1 */ bool _has_flushed_dependencies; /* 347 | 1 */ bool _unload_reported; /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; /* 350 | 1 */ bool _oops_are_stale; ------------- PR: https://git.openjdk.org/jdk/pull/9165 From bulasevich at openjdk.org Fri Jul 8 13:19:43 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 8 Jul 2022 13:19:43 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 18:36:43 GMT, Vladimir Kozlov wrote: > What about CompiledMethod? I didn't find a way to reduce the size of CompiledMethod: (gdb) ptype /o CompiledMethod /* offset | size */ type = class CompiledMethod : public CodeBlob { protected: /* 120 | 1 */ enum CompiledMethod::MarkForDeoptimizationStatus _mark_for_deoptimization_status; /* 120:23 | 4 */ unsigned int _has_unsafe_access : 1; /* 120:22 | 4 */ unsigned int _has_method_handle_invokes : 1; /* 120:21 | 4 */ unsigned int _has_wide_vectors : 1; /* 120:20 | 4 */ unsigned int _has_monitors : 1; /* XXX 4-bit hole */ /* XXX 6-byte hole */ /* 128 | 8 */ class Method *_method; /* 136 | 8 */ address _scopes_data_begin; /* 144 | 8 */ address _deopt_handler_begin; /* 152 | 8 */ address _deopt_mh_handler_begin; /* 160 | 32 */ class PcDescContainer { private: /* 160 | 32 */ class PcDescCache { private: /* 160 | 32 */ volatile PcDescPtr _pc_descs[4]; /* total size (bytes): 32 */ } _pc_desc_cache; /* total size (bytes): 32 */ } _pc_desc_container; /* 192 | 8 */ class ExceptionCache * volatile _exception_cache; /* 200 | 8 */ void *_gc_data; /* total size (bytes): 208 */ } > Would be interesting to profile fields access. I assume hot fields should be in first cache line. Or it is not important? I don't have a heatmap of the nmethod structure. If the nmethod data is being actively accessed on a critical code path, it probably makes sense to move the hot fields together (it doesn't have to be the first cache line, I guess). Yes, my concern was that reordering the fields as a side effect might impact performance due to a cache miss or something like that. I ran Renaissance Suite to make sure that performance is not affected. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From jvernee at openjdk.org Fri Jul 8 11:25:56 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 8 Jul 2022 11:25:56 GMT Subject: [jdk19] RFR: 8287809: Revisit implementation of memory session [v6] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 21:50:36 GMT, Maurizio Cimadamore wrote: >> This is a JDK 19 clone of: https://github.com/openjdk/jdk/pull/9017 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Turn non-closeable view back into MemorySession impl Marked as reviewed by jvernee (Reviewer). ------------- PR: https://git.openjdk.org/jdk19/pull/22 From bulasevich at openjdk.org Fri Jul 8 13:23:24 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 8 Jul 2022 13:23:24 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> References: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> Message-ID: On Thu, 7 Jul 2022 22:53:12 GMT, Dean Long wrote: > Most of the files changed are because of CompLevel. It feels a little disruptive. I'd rather do the minimal changes. Do you mind using the CompilerType? Since we have this type defined, I think it should be used. Does it make sense to propose this int->CompilerType cleanup as a separate change prior to this one? ------------- PR: https://git.openjdk.org/jdk/pull/9165 From iklam at openjdk.org Fri Jul 8 05:42:31 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 05:42:31 GMT Subject: RFR: 8289710: Move Suspend/Resume classes out of os.hpp [v2] In-Reply-To: References: <1nrk_DY_T3k1_mAl9y7g482aoxB3tqNOgGdIOZu2ebw=.1b315ee3-bac9-49af-9b2b-4abd9446cebd@github.com> Message-ID: <_EnUBz76NDTkijJgArXas1gP9x8JsmYlMwMfHVQF1ew=.6b722ce1-200d-49d0-bce4-b2ceee83a278@github.com> On Wed, 6 Jul 2022 13:11:32 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> moved SuspendResume class to os/posix directory > > The movement out of class os looks like an improvement to me. Thanks @coleenp and @dholmes-ora for the review. ------------- PR: https://git.openjdk.org/jdk/pull/9371 From iklam at openjdk.org Fri Jul 8 06:27:56 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 06:27:56 GMT Subject: RFR: 8265473: Move os::Linux to its own header file Message-ID: Another step of moving unnecessary stuff outside of os.hpp The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. ------------- Commit messages: - 8265473: Move os::Linux to its own header file Changes: https://git.openjdk.org/jdk/pull/9423/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9423&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8265473 Stats: 431 lines in 12 files changed: 16 ins; 401 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/9423.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9423/head:pull/9423 PR: https://git.openjdk.org/jdk/pull/9423 From aph at openjdk.org Fri Jul 8 07:14:59 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 8 Jul 2022 07:14:59 GMT Subject: RFR: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions [v8] In-Reply-To: References: Message-ID: <-hjWwRXyu9iKpZOqeoSjvNxb2yRS3Q0P7V7AMpzbq9w=.20bc99b8-1bff-4f83-9f92-4bbc2c74368b@github.com> On Thu, 7 Jul 2022 18:20:32 GMT, Paul Hohensee wrote: > Andrew, looks like Dima will be good to go once you re-review it. Fine by me, but note where we are in rampdown. ------------- PR: https://git.openjdk.org/jdk/pull/8779 From aph at openjdk.org Fri Jul 8 07:07:41 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 8 Jul 2022 07:07:41 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 06:56:37 GMT, David Holmes wrote: > To be on the safe side I'm putting this through our internal testing. Please hold off integrating until I give it the green light. Thanks. OK, thanks. There's no hurry, and no need to get this one into the next release. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From dchuyko at openjdk.org Fri Jul 8 08:59:31 2022 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 8 Jul 2022 08:59:31 GMT Subject: Integrated: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions In-Reply-To: References: Message-ID: On Wed, 18 May 2022 19:05:03 GMT, Dmitry Chuyko wrote: > On AArch64 it is sometimes convenient to have LSE atomics right from the start. Currently they are enabled after feature detection and RR reverse debugger works incorrectly. > > New build configuration feature 'hardlse' is added. If it is enabled for aarch64 type of build, then statically compiled stubs replace the initial pessimistic implementation and dynamically generated replacements (when LSE support is detected). The feature works for builds of all debug levels. > > New file atomic_linux_aarch64_lse.S is derived from atomic_linux_aarch64.S and inherits its copyright. This alternative static implementation corresponds to the dynamically generated code. > > Note, this configuration part is necessary but not sufficient to fully avoid strex instructions for practical purposes. Other parts are: > > * Run on the OS built without strex family instructions. E.g. Amazon Linux 2022. > * Compile with outline atomics enabled and the configuration flag enabled. E.g. configure with > --with-extra-cflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-cxxflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-ldflags='-Wl,--allow-multiple-definition' --with-jvm-features=hardlse > > Testing: tier1, tier2 on linux-aarch64 release builds with feature off and feature on. This pull request has now been integrated. Changeset: a13af650 Author: Dmitry Chuyko URL: https://git.openjdk.org/jdk/commit/a13af650437de508d64f0b12285a6ffc9901f85f Stats: 83 lines in 2 files changed: 74 ins; 0 del; 9 mod 8282322: AArch64: Provide a means to eliminate all STREX family of instructions Reviewed-by: ngasson, aph ------------- PR: https://git.openjdk.org/jdk/pull/8779 From dholmes at openjdk.org Fri Jul 8 09:56:45 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 8 Jul 2022 09:56:45 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:28:06 GMT, Andrew Haley wrote: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. A lot of failures around one assertion AFAICS: # Internal Error (/opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S10227/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/bfa54a47-3090-417b-b4b7-6433109e172e/runs/91d70c18-da15-420e-99f5-35e9f1ce15cb/workspace/open/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:204), pid=844793, tid=844815 # assert(target_addr_for_insn(insn_addr) == target) failed: should be # # JRE version: Java(TM) SE Runtime Environment (20.0) (fastdebug build 20-internal-2022-07-08-0655086.david.holmes.jdk-dev3.git) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-internal-2022-07-08-0655086.david.holmes.jdk-dev3.git, mixed mode, compressed class ptrs, z gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x13fda54] MacroAssembler::pd_patch_instruction_size(unsigned char*, unsigned char*)+0x114 # --------------- T H R E A D --------------- Current thread (0x0000fffd502e3bc0): JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=844815, stack(0x0000fffd31c00000,0x0000fffd31e00000)] Current CompileTask: C2: 383 95 b compiler.unsafe.UnsafeGetConstantField::checkGetAddress (10 bytes) Stack: [0x0000fffd31c00000,0x0000fffd31e00000], sp=0x0000fffd31dfa120, free space=2024k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x13fda54] MacroAssembler::pd_patch_instruction_size(unsigned char*, unsigned char*)+0x114 V [libjvm.so+0x16febd0] Relocation::pd_set_data_value(unsigned char*, long, bool)+0x40 V [libjvm.so+0x16f8f5c] external_word_Relocation::fix_relocation_after_move(CodeBuffer const*, CodeBuffer*)+0x8c V [libjvm.so+0xa3d444] CodeBuffer::relocate_code_to(CodeBuffer*) const+0x480 V [libjvm.so+0xa405c4] CodeBuffer::copy_code_to(CodeBlob*)+0x90 V [libjvm.so+0x155600c] nmethod::nmethod(Method*, CompilerType, int, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int, char*, int, int)+0x3ec V [libjvm.so+0x1556640] nmethod::new_nmethod(methodHandle const&, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int, char*, int, int, char const*, FailedSpeculation**)+0x270 V [libjvm.so+0x94c144] ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, bool, bool, bool, int, RTMState)+0x314 V [libjvm.so+0x15fa92c] PhaseOutput::install_code(ciMethod*, int, AbstractCompiler*, bool, bool, RTMState)+0x148 V [libjvm.so+0xa8caac] Compile::Code_Gen()+0x3fc V [libjvm.so+0xa9104c] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x112c V [libjvm.so+0x8c0918] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1c4 V [libjvm.so+0xa9f094] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x874 V [libjvm.so+0xa9ff6c] CompileBroker::compiler_thread_loop()+0x6ac V [libjvm.so+0xfb7dc4] JavaThread::thread_main_inner()+0x250 V [libjvm.so+0x18cfaa8] Thread::call_run()+0xf8 V [libjvm.so+0x15da2d4] thread_native_entry(Thread*)+0x104 C [libpthread.so.0+0x78f8] start_thread+0x188 Different stacktraces. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From tschatzl at openjdk.org Fri Jul 8 08:49:23 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Jul 2022 08:49:23 GMT Subject: RFR: 8289137: Automatically adapt Young/OldPLABSize and when setting only MinTLABSize Message-ID: Hi all, can I get reviews for this enhancement fixing a sometimes annoying UI issue where setting `-XX:MinTLABSize` does not automatically update `-XX:YoungPLABSize` and `-XX:OldPLABSize` if they are not set? This avoids some unnecessary retries. Testing: gha, test case Thanks, Thomas ------------- Commit messages: - Add test - initial version Changes: https://git.openjdk.org/jdk/pull/9425/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9425&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289137 Stats: 93 lines in 4 files changed: 93 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9425.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9425/head:pull/9425 PR: https://git.openjdk.org/jdk/pull/9425 From coleenp at openjdk.org Fri Jul 8 15:58:31 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Jul 2022 15:58:31 GMT Subject: RFR: 8271707: migrate tests to use jdk.test.whitebox.WhiteBox In-Reply-To: References: Message-ID: <2mTLTyDQH8b2csvKDlgLMlfVhSm7JJXvFcuNMeqnYO0=.ab170919-9813-40be-a097-43671975d984@github.com> On Thu, 7 Jul 2022 20:43:09 GMT, Coleen Phillimore wrote: > This change uses sed to change sun.hotspot.WhiteBox to jdk.test.whitebox.Whitebox, and sun/hotspot/Whitebox similarly. Due to indirect inclusions of some of the test libraries, changing a few wasn't a reliable option, and I need the new one for a different change I was looking at. > The non-sed changes are for jdk/test/whitebox/WhiteBox to add some code for GC that was only added to the sun version. > Also, the ClassFileInstaller has a label for sun.hotspot.Whitebox so that didn't change with the edit. > Tested with tiers1-6. Thanks Leonid! ------------- PR: https://git.openjdk.org/jdk/pull/9417 From coleenp at openjdk.org Fri Jul 8 15:58:32 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Jul 2022 15:58:32 GMT Subject: Integrated: 8271707: migrate tests to use jdk.test.whitebox.WhiteBox In-Reply-To: References: Message-ID: <7OYMQYFZ2rIDV87GtSdBA2jbeOcjjsCVQq_GWOPfEoU=.846e9278-06ec-4428-83e9-d2a883a04e5f@github.com> On Thu, 7 Jul 2022 20:43:09 GMT, Coleen Phillimore wrote: > This change uses sed to change sun.hotspot.WhiteBox to jdk.test.whitebox.Whitebox, and sun/hotspot/Whitebox similarly. Due to indirect inclusions of some of the test libraries, changing a few wasn't a reliable option, and I need the new one for a different change I was looking at. > The non-sed changes are for jdk/test/whitebox/WhiteBox to add some code for GC that was only added to the sun version. > Also, the ClassFileInstaller has a label for sun.hotspot.Whitebox so that didn't change with the edit. > Tested with tiers1-6. This pull request has now been integrated. Changeset: e7795851 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/e7795851d2e02389e63950fef939084b18ec4bfb Stats: 2994 lines in 984 files changed: 6 ins; 0 del; 2988 mod 8271707: migrate tests to use jdk.test.whitebox.WhiteBox Reviewed-by: lmesnik, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/9417 From iklam at openjdk.org Fri Jul 8 17:40:46 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 17:40:46 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v2] In-Reply-To: References: Message-ID: > Another step of moving unnecessary stuff outside of os.hpp > > The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. > > I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed gtest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9423/files - new: https://git.openjdk.org/jdk/pull/9423/files/8ab57889..c8bd68b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9423&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9423&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9423.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9423/head:pull/9423 PR: https://git.openjdk.org/jdk/pull/9423 From iklam at openjdk.org Fri Jul 8 18:16:35 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 18:16:35 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v3] In-Reply-To: References: Message-ID: > Another step of moving unnecessary stuff outside of os.hpp > > The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. > > I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed other linux variants ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9423/files - new: https://git.openjdk.org/jdk/pull/9423/files/c8bd68b4..5c540f8c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9423&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9423&range=01-02 Stats: 6 lines in 6 files changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9423.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9423/head:pull/9423 PR: https://git.openjdk.org/jdk/pull/9423 From dchuyko at openjdk.org Fri Jul 8 08:59:30 2022 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 8 Jul 2022 08:59:30 GMT Subject: RFR: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions [v8] In-Reply-To: References: Message-ID: On Fri, 17 Jun 2022 20:06:55 GMT, Dmitry Chuyko wrote: >> On AArch64 it is sometimes convenient to have LSE atomics right from the start. Currently they are enabled after feature detection and RR reverse debugger works incorrectly. >> >> New build configuration feature 'hardlse' is added. If it is enabled for aarch64 type of build, then statically compiled stubs replace the initial pessimistic implementation and dynamically generated replacements (when LSE support is detected). The feature works for builds of all debug levels. >> >> New file atomic_linux_aarch64_lse.S is derived from atomic_linux_aarch64.S and inherits its copyright. This alternative static implementation corresponds to the dynamically generated code. >> >> Note, this configuration part is necessary but not sufficient to fully avoid strex instructions for practical purposes. Other parts are: >> >> * Run on the OS built without strex family instructions. E.g. Amazon Linux 2022. >> * Compile with outline atomics enabled and the configuration flag enabled. E.g. configure with >> --with-extra-cflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-cxxflags='-march=armv8.3-a+crc+crypto -moutline-atomics' --with-extra-ldflags='-Wl,--allow-multiple-definition' --with-jvm-features=hardlse >> >> Testing: tier1, tier2 on linux-aarch64 release builds with feature off and feature on. > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Moved 2 prfm-s from wrong ifdef branch > - Removed unnecessary changes (forced UseLSE, blank lines) > - Merge branch 'openjdk:master' into JDK-8282322 > - Merge branch 'openjdk:master' into JDK-8282322 > - Use LSE in linux-aarch64 asm code if __ARM_FEATURE_ATOMICS is on > - Revert "hardlse feature" > > This reverts commit c5da85d3282bb995f69639f8f592cc94560916c5. > - Merge branch 'openjdk:master' into JDK-8282322 > - ... and 1 more: https://git.openjdk.org/jdk/compare/d69d43b0...d1ae97d9 Andrew, thanks for taking a look. This change is now for the master, later we can also consider update releases. ------------- PR: https://git.openjdk.org/jdk/pull/8779 From iklam at openjdk.org Fri Jul 8 18:20:42 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 18:20:42 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v3] In-Reply-To: References: Message-ID: <0IQDcQtZEEWEbRBF45Lrp3hpJSiBlYQcaX3o0RBWNbo=.f7455117-2e42-4a8f-9461-bb3a34d46d1b@github.com> On Fri, 8 Jul 2022 18:16:35 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Fixed other linux variants Another place that the os::Linux file could be moved to is [os_share_linux.hpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/os/linux/os_share_linux.hpp). Today it contains some outdated declarations that are not used by anyone. If we interpret the name of this header to be "shared interfaces used by the os/*.cpp files", then os::Linux would belong here. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From iklam at openjdk.org Fri Jul 8 18:37:42 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 18:37:42 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: References: Message-ID: On Thu, 7 Jul 2022 15:12:49 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/forte.hpp line 32: >> >>> 30: class Forte : AllStatic { >>> 31: public: >>> 32: static bool is_enabled() NOT_JVMTI_RETURN_(false); >> >> I don't think the rest of this forte code is disabled by JVMTI. > > If the answer to whether it's enabled is something you want to be fast, and doesn't change, maybe make it check a variable? The code in the `Forte` class, as well as the non-trivial of `AsyncGetCallTrace()` in forte.cpp, are inside `#if INCLUDE_JVMTI`. That's why I use `NOT_JVMTI_RETURN_` for the new function. ------------- PR: https://git.openjdk.org/jdk/pull/9386 From coleenp at openjdk.org Fri Jul 8 18:51:43 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Jul 2022 18:51:43 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v3] In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 18:16:35 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Fixed other linux variants This seems fine. The name could be os_linux_impl.hpp, since impl looks a lot like inline surrounded by dots. os_share_linux.hpp should be deleted if it's not used. I wouldn't think of looking there for anything. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/9423 From duke at openjdk.org Fri Jul 8 18:53:46 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Fri, 8 Jul 2022 18:53:46 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines in C2 [v2] In-Reply-To: References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: On Thu, 7 Jul 2022 21:37:36 GMT, Yi-Fan Tsai wrote: >> A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. >> >> Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Rename variables src/hotspot/share/asm/codeBuffer.hpp line 429: > 427: > 428: SharedStubToInterpRequests* _shared_stub_to_interp_requests; // used to collect requests for shared iterpreter stubs > 429: SharedTrampolineRequests* _shared_trampoline_requests; // used to collect requests for shared runtime call stubs Please update the comment. ------------- PR: https://git.openjdk.org/jdk/pull/9405 From coleenp at openjdk.org Fri Jul 8 18:53:47 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Jul 2022 18:53:47 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 17:50:54 GMT, Ioi Lam wrote: >> `Forte::register_stub()` should be called only when the JVm is being instrumented by Forte (aka "Oracle Developer Studio") >> >> https://www.oracle.com/tools/developerstudio/downloads/developer-studio-jsp.html >> >> We currently always format the name of generated stubs and call `Forte::register_stub()`, which usually does nothing. >> >> Example: >> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2686-L2697 >> >> To improve start-up, we should check if Forte is enabled before formatting the name. >> >> I also renamed some `#ifndef IA64` around the code that I touched. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Do not remove Forte::register_stub as it is used on Linux as well Marked as reviewed by coleenp (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9386 From coleenp at openjdk.org Fri Jul 8 18:53:48 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Jul 2022 18:53:48 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 18:35:38 GMT, Ioi Lam wrote: >> If the answer to whether it's enabled is something you want to be fast, and doesn't change, maybe make it check a variable? > > The code in the `Forte` class, as well as the non-trivial of `AsyncGetCallTrace()` in forte.cpp, are inside `#if INCLUDE_JVMTI`. That's why I use `NOT_JVMTI_RETURN_` for the new function. Ok, thanks for answering my question. ------------- PR: https://git.openjdk.org/jdk/pull/9386 From dlong at openjdk.org Fri Jul 8 19:45:42 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Jul 2022 19:45:42 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> Message-ID: On Fri, 8 Jul 2022 13:20:22 GMT, Boris Ulasevich wrote: > > Most of the files changed are because of CompLevel. It feels a little disruptive. I'd rather do the minimal changes. > > Do you mind using the CompilerType? Since we have this type defined, I think it should be used. Does it make sense to propose this int->CompilerType cleanup as a separate change prior to this one? I was going to suggest doing it as a separate change after this one. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From kvn at openjdk.org Fri Jul 8 20:35:37 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Jul 2022 20:35:37 GMT Subject: RFR: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 In-Reply-To: References: Message-ID: <6KltnZFZDqf5kuIjF_t0ns-DVjqSaLj8kFfBxS6rwt0=.aa0372ae-3c0b-4b00-b005-5431a113c9f4@github.com> On Fri, 8 Jul 2022 01:43:11 GMT, Fei Gao wrote: >> In which call to `adjust_alignment_for_type_conversion()` you got AddP node? >> Should we add checks there too? > >> In which call to `adjust_alignment_for_type_conversion()` you got AddP node? Should we add checks there too? > > Thanks for your review, @vnkozlov . > > When we called `adjust_alignment_for_type_conversion()` in `SuperWord::follow_def_uses()`, https://github.com/openjdk/jdk/blob/3f1174aa4709aabcfde8b40deec88b8ed466cc06/src/hotspot/share/opto/superword.cpp#L1525, we got AddP node. In this function, we also call `stmts_can_pack()` on the next line, which has checks to prevent unwanted pairs, https://github.com/openjdk/jdk/blob/3f1174aa4709aabcfde8b40deec88b8ed466cc06/src/hotspot/share/opto/superword.cpp#L1202. Maybe we don't have to add one more. WDYT? @fg1417 `stmts_can_pack()` is called in an other place which is preceded by `are_adjacent_refs()` call which also has primitive type check (but different). I was thinking to convert checks in `stmts_can_pack()` to `assert`. But, on other hand, `is_java_primitive(bt)` is cheap and I would prefer to keep checks in `stmts_can_pack()` as they are in case we call it in an other place. Anyway. After looking on code I agree with your current changes. Let me test it. And you need second review. ------------- PR: https://git.openjdk.org/jdk/pull/9391 From kvn at openjdk.org Fri Jul 8 20:38:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Jul 2022 20:38:40 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> References: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> Message-ID: On Thu, 7 Jul 2022 22:53:12 GMT, Dean Long wrote: > Most of the files changed are because of CompLevel. It feels a little disruptive. I'd rather do the minimal changes. > > There is also a lot of unnecessary space used by these addresses: address _code_begin; address _code_end; address _content_begin; address _data_end; address _relocation_begin; address _relocation_end; > > Now that AOT has been removed, we could go back to 3 int fields like in jdk8. There is Leyden project for which we may need it. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From dlong at openjdk.org Fri Jul 8 20:45:41 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Jul 2022 20:45:41 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> Message-ID: On Fri, 8 Jul 2022 20:36:40 GMT, Vladimir Kozlov wrote: > > Most of the files changed are because of CompLevel. It feels a little disruptive. I'd rather do the minimal changes. > > There is also a lot of unnecessary space used by these addresses: address _code_begin; address _code_end; address _content_begin; address _data_end; address _relocation_begin; address _relocation_end; > > Now that AOT has been removed, we could go back to 3 int fields like in jdk8. > > There is Leyden project for which we may need it. OK, but the X_end pointers could probably be 32-bit size fields relative to X_start. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From kvn at openjdk.org Fri Jul 8 20:55:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Jul 2022 20:55:40 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> Message-ID: On Fri, 8 Jul 2022 19:42:23 GMT, Dean Long wrote: > > > Most of the files changed are because of CompLevel. It feels a little disruptive. I'd rather do the minimal changes. > > > > > > Do you mind using the CompilerType? Since we have this type defined, I think it should be used. Does it make sense to propose this int->CompilerType cleanup as a separate change prior to this one? > > I was going to suggest doing it as a separate change after this one. I agree with Dean. Lets change int->CompLevel in separate changes. CompilerType is not the same as CompLevel. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From kvn at openjdk.org Fri Jul 8 20:55:42 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Jul 2022 20:55:42 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> Message-ID: On Fri, 8 Jul 2022 20:43:36 GMT, Dean Long wrote: > > > Most of the files changed are because of CompLevel. It feels a little disruptive. I'd rather do the minimal changes. > > > There is also a lot of unnecessary space used by these addresses: address _code_begin; address _code_end; address _content_begin; address _data_end; address _relocation_begin; address _relocation_end; > > > Now that AOT has been removed, we could go back to 3 int fields like in jdk8. > > > > > > There is Leyden project for which we may need it. > > OK, but the X_end pointers could probably be 32-bit size fields relative to X_start. I agree with that. I was also thinking about it. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From coleenp at openjdk.org Fri Jul 8 21:50:12 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Jul 2022 21:50:12 GMT Subject: RFR: 8275662: remove test/lib/sun/hotspot Message-ID: This change removes the last remnants of sun/hotspot/WhiteBox.java and other classes, and uses the versions in jdk/test/whitebox. I used sed to change sun.hotspot.{gc,code,cpuinfo} to jdk.test.whitebox and deleted the old files and some references to sun.hotspot. Tested with tier1-4. ------------- Commit messages: - 8275662: remove test/lib/sun/hotspot Changes: https://git.openjdk.org/jdk/pull/9434/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9434&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8275662 Stats: 1484 lines in 99 files changed: 0 ins; 1367 del; 117 mod Patch: https://git.openjdk.org/jdk/pull/9434.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9434/head:pull/9434 PR: https://git.openjdk.org/jdk/pull/9434 From mseledtsov at openjdk.org Fri Jul 8 21:50:12 2022 From: mseledtsov at openjdk.org (Mikhailo Seledtsov) Date: Fri, 8 Jul 2022 21:50:12 GMT Subject: RFR: 8275662: remove test/lib/sun/hotspot In-Reply-To: References: Message-ID: <-s197KfRIxYZ-QIqR_c48nZlUjed9kTjHTOGOqlVHWg=.a11735f0-9e5b-44f4-907a-c11230540a87@github.com> On Fri, 8 Jul 2022 19:46:17 GMT, Coleen Phillimore wrote: > This change removes the last remnants of sun/hotspot/WhiteBox.java and other classes, and uses the versions in jdk/test/whitebox. > I used sed to change sun.hotspot.{gc,code,cpuinfo} to jdk.test.whitebox and deleted the old files and some references to sun.hotspot. > Tested with tier1-4. Changes look good to me. Thank you. ------------- Marked as reviewed by mseledtsov (Committer). PR: https://git.openjdk.org/jdk/pull/9434 From coleenp at openjdk.org Fri Jul 8 21:50:13 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Jul 2022 21:50:13 GMT Subject: RFR: 8275662: remove test/lib/sun/hotspot In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 19:46:17 GMT, Coleen Phillimore wrote: > This change removes the last remnants of sun/hotspot/WhiteBox.java and other classes, and uses the versions in jdk/test/whitebox. > I used sed to change sun.hotspot.{gc,code,cpuinfo} to jdk.test.whitebox and deleted the old files and some references to sun.hotspot. > Tested with tier1-4. Thanks Misha! ------------- PR: https://git.openjdk.org/jdk/pull/9434 From sspitsyn at openjdk.org Fri Jul 8 22:47:40 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Jul 2022 22:47:40 GMT Subject: RFR: 8275662: remove test/lib/sun/hotspot In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 19:46:17 GMT, Coleen Phillimore wrote: > This change removes the last remnants of sun/hotspot/WhiteBox.java and other classes, and uses the versions in jdk/test/whitebox. > I used sed to change sun.hotspot.{gc,code,cpuinfo} to jdk.test.whitebox and deleted the old files and some references to sun.hotspot. > Tested with tier1-4. This looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9434 From sspitsyn at openjdk.org Fri Jul 8 22:58:41 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Jul 2022 22:58:41 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 17:50:54 GMT, Ioi Lam wrote: >> `Forte::register_stub()` should be called only when the JVm is being instrumented by Forte (aka "Oracle Developer Studio") >> >> https://www.oracle.com/tools/developerstudio/downloads/developer-studio-jsp.html >> >> We currently always format the name of generated stubs and call `Forte::register_stub()`, which usually does nothing. >> >> Example: >> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2686-L2697 >> >> To improve start-up, we should check if Forte is enabled before formatting the name. >> >> I also renamed some `#ifndef IA64` around the code that I touched. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Do not remove Forte::register_stub as it is used on Linux as well I've posed one comment. Other than that the fix looks okay. Thanks, Serguei src/hotspot/share/runtime/sharedRuntime.cpp line 2700: > 2698: if (JvmtiExport::should_post_dynamic_code_generated()) { > 2699: JvmtiExport::post_dynamic_code_generated(blob_id, new_adapter->content_begin(), new_adapter->content_end()); > 2700: } The lines 2698-2670 is better to move out of the if-statement at the line 2687. ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9386 From iklam at openjdk.org Fri Jul 8 23:09:46 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 8 Jul 2022 23:09:46 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 22:53:49 GMT, Serguei Spitsyn wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Do not remove Forte::register_stub as it is used on Linux as well > > src/hotspot/share/runtime/sharedRuntime.cpp line 2700: > >> 2698: if (JvmtiExport::should_post_dynamic_code_generated()) { >> 2699: JvmtiExport::post_dynamic_code_generated(blob_id, new_adapter->content_begin(), new_adapter->content_end()); >> 2700: } > > The lines 2698-2670 is better to move out of the if-statement at the line 2687. HI Serguei, thanks for the review. Lines 2698-2700 need the blob_id which is generated at line 2688, so they have to stay inside the outer "if" block. ------------- PR: https://git.openjdk.org/jdk/pull/9386 From sspitsyn at openjdk.org Sat Jul 9 01:04:46 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 9 Jul 2022 01:04:46 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 23:06:03 GMT, Ioi Lam wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 2700: >> >>> 2698: if (JvmtiExport::should_post_dynamic_code_generated()) { >>> 2699: JvmtiExport::post_dynamic_code_generated(blob_id, new_adapter->content_begin(), new_adapter->content_end()); >>> 2700: } >> >> The lines 2698-2670 is better to move out of the if-statement at the line 2687. > > HI Serguei, thanks for the review. Lines 2698-2700 need the blob_id which is generated at line 2688, so they have to stay inside the outer "if" block. Thanks, Ioi. You are right. ------------- PR: https://git.openjdk.org/jdk/pull/9386 From duke at openjdk.org Sat Jul 9 03:06:30 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Sat, 9 Jul 2022 03:06:30 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines in C2 [v3] In-Reply-To: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: > A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. > > Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Use a hash table to deduplicate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9405/files - new: https://git.openjdk.org/jdk/pull/9405/files/0c225a66..df99b229 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=01-02 Stats: 35 lines in 2 files changed: 19 ins; 12 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9405.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9405/head:pull/9405 PR: https://git.openjdk.org/jdk/pull/9405 From lmesnik at openjdk.org Sat Jul 9 03:46:42 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 9 Jul 2022 03:46:42 GMT Subject: RFR: 8275662: remove test/lib/sun/hotspot In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 19:46:17 GMT, Coleen Phillimore wrote: > This change removes the last remnants of sun/hotspot/WhiteBox.java and other classes, and uses the versions in jdk/test/whitebox. > I used sed to change sun.hotspot.{gc,code,cpuinfo} to jdk.test.whitebox and deleted the old files and some references to sun.hotspot. > Tested with tier1-4. Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9434 From iklam at openjdk.org Sat Jul 9 03:48:50 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 9 Jul 2022 03:48:50 GMT Subject: RFR: 8289780: Avoid formatting stub names when Forte is not enabled [v2] In-Reply-To: <9iGaC4aFHNx6w3kzoJNFvidssIKlNdNl5TFB1MxhoTI=.1feed3a4-92a2-4272-86b8-38c310303a3e@github.com> References: <9iGaC4aFHNx6w3kzoJNFvidssIKlNdNl5TFB1MxhoTI=.1feed3a4-92a2-4272-86b8-38c310303a3e@github.com> Message-ID: On Wed, 6 Jul 2022 01:24:09 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Do not remove Forte::register_stub as it is used on Linux as well > > Cleanup looks good! Thanks. Thanks @dholmes-ora, @sspitsyn, @coleenp for the review. ------------- PR: https://git.openjdk.org/jdk/pull/9386 From iklam at openjdk.org Sat Jul 9 03:49:59 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 9 Jul 2022 03:49:59 GMT Subject: Integrated: 8289780: Avoid formatting stub names when Forte is not enabled In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 00:38:14 GMT, Ioi Lam wrote: > `Forte::register_stub()` should be called only when the JVm is being instrumented by Forte (aka "Oracle Developer Studio") > > https://www.oracle.com/tools/developerstudio/downloads/developer-studio-jsp.html > > We currently always format the name of generated stubs and call `Forte::register_stub()`, which usually does nothing. > > Example: > > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2686-L2697 > > To improve start-up, we should check if Forte is enabled before formatting the name. > > I also renamed some `#ifndef IA64` around the code that I touched. This pull request has now been integrated. Changeset: 3c08e6b3 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/3c08e6b311121e05e30b88c0e325317f364ef15d Stats: 37 lines in 5 files changed: 18 ins; 3 del; 16 mod 8289780: Avoid formatting stub names when Forte is not enabled Reviewed-by: dholmes, coleenp, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/9386 From kvn at openjdk.org Sat Jul 9 04:25:27 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 9 Jul 2022 04:25:27 GMT Subject: RFR: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 In-Reply-To: References: Message-ID: <4WPJvLoIioVl4o0Ro5YcBOg82izdiv1T0Re1nTGwOEo=.3d3bbe1d-fb5e-4908-be4f-fd5266c2d04a@github.com> On Wed, 6 Jul 2022 07:51:01 GMT, Fei Gao wrote: > Superword doesn't vectorize any nodes of non-primitive types and > thus sets `allow_address` false when calling type2aelembytes() in > SuperWord::data_size()[1]. Therefore, when we try to resolve the > data size for a node of T_ADDRESS type, the assertion in > type2aelembytes()[2] takes effect. > > We try to resolve the data sizes for node s and node t in the > SuperWord::adjust_alignment_for_type_conversion()[3] when type > conversion between different data sizes happens. The issue is, > when node s is a ConvI2L node and node t is an AddP node of > T_ADDRESS type, type2aelembytes() will assert. To fix it, we > should filter out all non-primitive nodes, like the patch does > in SuperWord::adjust_alignment_for_type_conversion(). Since > it's a failure in the mid-end, all superword available platforms > are affected. In my local test, this failure can be reproduced > on both x86 and aarch64. With this patch, the failure can be fixed. > > Apart from fixing the bug, the patch also adds necessary type check > and does some clean-up in SuperWord::longer_type_for_conversion() > and VectorCastNode::implemented(). > > [1]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1417 > [2]https://github.com/openjdk/jdk/blob/b96ba19807845739b36274efb168dd048db819a3/src/hotspot/share/utilities/globalDefinitions.cpp#L326 > [3]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1454 Testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9391 From iklam at openjdk.org Sat Jul 9 05:02:53 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 9 Jul 2022 05:02:53 GMT Subject: RFR: 8290027: Move inline functions from vm_version_x86.hpp to cpp Message-ID: There are several large inline functions in vm_version_x86.hpp that are used only by vm_version_x86.cpp. E.g., `feature_flags()` These should be moved to vm_version_x86.cpp to improve C++ build speed. I manually diff'ed the lines that were moved between the two files. They were identical except for whitespaces. ------------- Commit messages: - 8290027: Move inline functions from vm_version_x86.hpp to cpp Changes: https://git.openjdk.org/jdk/pull/9431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9431&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290027 Stats: 638 lines in 2 files changed: 320 ins; 311 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/9431.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9431/head:pull/9431 PR: https://git.openjdk.org/jdk/pull/9431 From aph at openjdk.org Sat Jul 9 09:31:30 2022 From: aph at openjdk.org (Andrew Haley) Date: Sat, 9 Jul 2022 09:31:30 GMT Subject: RFR: 8282322: AArch64: Provide a means to eliminate all STREX family of instructions [v8] In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 08:57:25 GMT, Dmitry Chuyko wrote: > Andrew, thanks for taking a look. This change is now for the master, later we can also consider update releases. You'll have to take your chances with that. I'm not sure that it qualifies for a backport under any of the usual criteria, but we can discuss that. ------------- PR: https://git.openjdk.org/jdk/pull/8779 From kbarrett at openjdk.org Sat Jul 9 17:10:38 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 9 Jul 2022 17:10:38 GMT Subject: RFR: 8290027: Move inline functions from vm_version_x86.hpp to cpp In-Reply-To: References: Message-ID: <-E3B6dMNMitVgtWhHZL2QsJqebbBWEFUDIJ1kRvZTTM=.808d4c14-0630-464d-8983-ed9c3b233acc@github.com> On Fri, 8 Jul 2022 17:19:34 GMT, Ioi Lam wrote: > There are several large inline functions in vm_version_x86.hpp that are used only by vm_version_x86.cpp. E.g., `feature_flags()` > > These should be moved to vm_version_x86.cpp to improve C++ build speed. > > I manually diff'ed the lines that were moved between the two files. They were identical except for whitespaces. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/9431 From iklam at openjdk.org Sat Jul 9 23:27:31 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 9 Jul 2022 23:27:31 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: > Another step of moving unnecessary stuff outside of os.hpp > > The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. > > I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: renamed to os_linux_impl.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9423/files - new: https://git.openjdk.org/jdk/pull/9423/files/5c540f8c..69f7272b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9423&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9423&range=02-03 Stats: 17 lines in 18 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/9423.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9423/head:pull/9423 PR: https://git.openjdk.org/jdk/pull/9423 From iklam at openjdk.org Sat Jul 9 23:27:31 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 9 Jul 2022 23:27:31 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v3] In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 18:48:19 GMT, Coleen Phillimore wrote: > This seems fine. The name could be os_linux_impl.hpp, since impl looks a lot like inline surrounded by dots. os_share_linux.hpp should be deleted if it's not used. I wouldn't think of looking there for anything. I renamed the file to os_linux_impl.hpp. I'll remove os_share_linux.hpp in a separate PR. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From stuefe at openjdk.org Sun Jul 10 04:41:43 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 10 Jul 2022 04:41:43 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 23:27:31 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > renamed to os_linux_impl.hpp I'm surprised that this works. That you are able to declare a nested class outside its enclosing class. Apart from that, seeing that this has nothing really to do anymore with os::, why not just drop the "os::" prefix? Just call it "Linux" or "LinuxImpl" or "LinuxHelpers" or "LinuxOsHelpers" ... ------------- PR: https://git.openjdk.org/jdk/pull/9423 From duke at openjdk.org Sun Jul 10 16:20:16 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Sun, 10 Jul 2022 16:20:16 GMT Subject: RFR: 8263377: Store method handle linkers in the 'non-nmethods' heap [v5] In-Reply-To: References: Message-ID: > 8263377: Store method handle linkers in the 'non-nmethods' heap Yi-Fan Tsai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Fix merge difference - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - Post dynamic_code_generate event when MH intrinsic generated - Remove dead codes remove unused argument of NativeJump::check_verified_entry_alignment remove unused argument of NativeJumip::patch_verified_entry remove dead codes in SharedRuntime::generate_method_handle_intrinsic_wrapper - Add PrintCodeCache support - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - Move to RuntimeBlob - Merge branch 'master' of https://github.com/yftsai/jdk into intrinsics - ... and 10 more: https://git.openjdk.org/jdk/compare/87aa3ce0...f65f7c08 ------------- Changes: https://git.openjdk.org/jdk/pull/8760/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=8760&range=04 Stats: 586 lines in 58 files changed: 279 ins; 174 del; 133 mod Patch: https://git.openjdk.org/jdk/pull/8760.diff Fetch: git fetch https://git.openjdk.org/jdk pull/8760/head:pull/8760 PR: https://git.openjdk.org/jdk/pull/8760 From dholmes at openjdk.org Sun Jul 10 23:20:41 2022 From: dholmes at openjdk.org (David Holmes) Date: Sun, 10 Jul 2022 23:20:41 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:28:06 GMT, Andrew Haley wrote: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. There are failures in tier1, 2 and 3 (ZGC use is in tier3). Most failures on Macos. Failing tests: compiler/codecache/stress/RandomAllocationTest.java compiler/codegen/TestOopCmp.java compiler/unsafe/UnsafeGetConstantField.java runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java#id4 ------------- PR: https://git.openjdk.org/jdk/pull/9398 From dholmes at openjdk.org Mon Jul 11 00:08:38 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 11 Jul 2022 00:08:38 GMT Subject: RFR: 8290027: Move inline functions from vm_version_x86.hpp to cpp In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 17:19:34 GMT, Ioi Lam wrote: > There are several large inline functions in vm_version_x86.hpp that are used only by vm_version_x86.cpp. E.g., `feature_flags()` > > These should be moved to vm_version_x86.cpp to improve C++ build speed. > > I manually diff'ed the lines that were moved between the two files. They were identical except for whitespaces. Seems fine. thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9431 From dholmes at openjdk.org Mon Jul 11 00:59:32 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 11 Jul 2022 00:59:32 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: <2mJqYX9o95PzmzgUF7sZo8ibzGI3xw3nyWWR9IvCJjA=.839909bd-7720-40d1-af96-5d15c585f96a@github.com> On Sat, 9 Jul 2022 23:27:31 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > renamed to os_linux_impl.hpp I share Thomas's surprise that this works. It also doesn't really make sense to me. You've rendered os_linux.hpp essentially an empty file and lament that there is not a good name for the new file because os_linux.hpp is taken - but that's because os_linux.hpp is actually where os::linux should be declared! You want to make os.hpp smaller so perhaps the thing to tackle is why os_.hpp gets included in os.hpp in the first place. Unless it adds to the shared os API (which seems it can't as then it could just be in os.hpp) then there should not be anything in os_.hpp that is needed for the shared os interface. So perhaps a different refactoring across all the os files is what is needed here. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From fgao at openjdk.org Mon Jul 11 01:37:39 2022 From: fgao at openjdk.org (Fei Gao) Date: Mon, 11 Jul 2022 01:37:39 GMT Subject: RFR: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 In-Reply-To: <6KltnZFZDqf5kuIjF_t0ns-DVjqSaLj8kFfBxS6rwt0=.aa0372ae-3c0b-4b00-b005-5431a113c9f4@github.com> References: <6KltnZFZDqf5kuIjF_t0ns-DVjqSaLj8kFfBxS6rwt0=.aa0372ae-3c0b-4b00-b005-5431a113c9f4@github.com> Message-ID: On Fri, 8 Jul 2022 20:32:24 GMT, Vladimir Kozlov wrote: >>> In which call to `adjust_alignment_for_type_conversion()` you got AddP node? Should we add checks there too? >> >> Thanks for your review, @vnkozlov . >> >> When we called `adjust_alignment_for_type_conversion()` in `SuperWord::follow_def_uses()`, https://github.com/openjdk/jdk/blob/3f1174aa4709aabcfde8b40deec88b8ed466cc06/src/hotspot/share/opto/superword.cpp#L1525, we got AddP node. In this function, we also call `stmts_can_pack()` on the next line, which has checks to prevent unwanted pairs, https://github.com/openjdk/jdk/blob/3f1174aa4709aabcfde8b40deec88b8ed466cc06/src/hotspot/share/opto/superword.cpp#L1202. Maybe we don't have to add one more. WDYT? > > @fg1417 `stmts_can_pack()` is called in an other place which is preceded by `are_adjacent_refs()` call which also has primitive type check (but different). I was thinking to convert checks in `stmts_can_pack()` to `assert`. But, on other hand, `is_java_primitive(bt)` is cheap and I would prefer to keep checks in `stmts_can_pack()` as they are in case we call it in an other place. > > Anyway. After looking on code I agree with your current changes. Let me test it. And you need second review. Thanks for your review and test work, @vnkozlov . May I have a second review please? ------------- PR: https://git.openjdk.org/jdk/pull/9391 From iklam at openjdk.org Mon Jul 11 03:14:51 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 11 Jul 2022 03:14:51 GMT Subject: RFR: 8290027: Move inline functions from vm_version_x86.hpp to cpp [v2] In-Reply-To: References: Message-ID: > There are several large inline functions in vm_version_x86.hpp that are used only by vm_version_x86.cpp. E.g., `feature_flags()` > > These should be moved to vm_version_x86.cpp to improve C++ build speed. > > I manually diff'ed the lines that were moved between the two files. They were identical except for whitespaces. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8290027-move-unnecessary-inline-funcs-vm-version-x86 - 8290027: Move inline functions from vm_version_x86.hpp to cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9431/files - new: https://git.openjdk.org/jdk/pull/9431/files/9ff42ede..9b38dce8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9431&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9431&range=00-01 Stats: 1095 lines in 53 files changed: 737 ins; 149 del; 209 mod Patch: https://git.openjdk.org/jdk/pull/9431.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9431/head:pull/9431 PR: https://git.openjdk.org/jdk/pull/9431 From iklam at openjdk.org Mon Jul 11 03:32:45 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 11 Jul 2022 03:32:45 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 23:27:31 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > renamed to os_linux_impl.hpp David and Thomas, I think you're right. The current structure of the `os.hpp` header file doesn't make sense. Files like `os_linux.hpp` and `os_linux_x86.hpp` declare additional member functions of the `os` class. However, these functions are never (or should never be) used by cross platform code. For example, `workaround_expand_exec_shield_cs_limit()` declared in `os_linux_x86.hpp` is used only under the src/hotspot/os/linux diretory. My proposal is to remove all os- and platform- specific includes from `os.hpp`. The `os` class should include only functions that are usable from shared code. Then, we can keep the `os_linux.hpp` file, which in turn would include other `os_linux_.hpp` files. These files should declare functions that are used by Linux-specific source files. For naming, I would prefer to leave it as `os::Linux` for now, so as to avoid making a huge number of changes. We can change it to something else (like the ones suggested by Thomas) in a separate RFE. Yes, it's legal to declare `os::Linux` outside of the declaration of the `os` class. See https://en.cppreference.com/w/cpp/language/nested_types With my new proposal, the function `workaround_expand_exec_shield_cs_limit()` declared in `os_linux_x86.hpp` will move from `os::workaround_expand_exec_shield_cs_limit()` to `os::Linux::workaround_expand_exec_shield_cs_limit()`. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From stuefe at openjdk.org Mon Jul 11 04:49:41 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 11 Jul 2022 04:49:41 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 23:27:31 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > renamed to os_linux_impl.hpp Interesting about forward-declaring nested classes, did not know that is possible. I think what you are trying to do - disentangling the interface and removing platform-dependent stuff - is really useful. I even think a lot of the platform-dependent stuff does not even need to appear in any header, local static functions in os_xxx.cpp would have been fine. I propose a slightly alternative way for this though. How about making `os` a real namespace? That's what AllStatic tries to be anyway. It's only a class because namespaces did not exist when it was introduced. In contrast to a class you can extend namespaces. Whereas a class must be always complete. You could have: os.hpp namespace os { .. // Return the default page size. static int vm_page_size(); .. // many more } os_linux.hpp namespace os { namespace Linux { ... void print_process_memory_info(outputStream* st); void print_system_memory_info(outputStream* st); ... } } Compilation units just needing the generic part can include `os.hpp` and be done with it. os_linux.cpp would include both os.hpp and os_linux.hpp. As a bonus, we can remove those awkward injections of class definitions into the middle of class os. The os_xxx.hpp headers then would become standard headers, includable on their own. We would loose the ability to make os:: functions private in the interface. But that can be solved by moving them into their own header. Or really taking a good look - why would private members have to appear in a public interface? I realize that this would be a bigger change than what you have planned, but I think that this way would be the standard way to organize an interface like this, easier to understand and to maintain. What do you think? ------------- PR: https://git.openjdk.org/jdk/pull/9423 From iklam at openjdk.org Mon Jul 11 05:23:44 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 11 Jul 2022 05:23:44 GMT Subject: RFR: 8290027: Move inline functions from vm_version_x86.hpp to cpp [v2] In-Reply-To: <-E3B6dMNMitVgtWhHZL2QsJqebbBWEFUDIJ1kRvZTTM=.808d4c14-0630-464d-8983-ed9c3b233acc@github.com> References: <-E3B6dMNMitVgtWhHZL2QsJqebbBWEFUDIJ1kRvZTTM=.808d4c14-0630-464d-8983-ed9c3b233acc@github.com> Message-ID: On Sat, 9 Jul 2022 17:07:26 GMT, Kim Barrett wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' of https://github.com/openjdk/jdk into 8290027-move-unnecessary-inline-funcs-vm-version-x86 >> - 8290027: Move inline functions from vm_version_x86.hpp to cpp > > Looks good. Thanks to @kimbarrett and @dholmes-ora for the review. ------------- PR: https://git.openjdk.org/jdk/pull/9431 From iklam at openjdk.org Mon Jul 11 05:23:46 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 11 Jul 2022 05:23:46 GMT Subject: Integrated: 8290027: Move inline functions from vm_version_x86.hpp to cpp In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 17:19:34 GMT, Ioi Lam wrote: > There are several large inline functions in vm_version_x86.hpp that are used only by vm_version_x86.cpp. E.g., `feature_flags()` > > These should be moved to vm_version_x86.cpp to improve C++ build speed. > > I manually diff'ed the lines that were moved between the two files. They were identical except for whitespaces. This pull request has now been integrated. Changeset: e9d9cc6d Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/e9d9cc6d0aece2237c490a610d79a562867251d8 Stats: 638 lines in 2 files changed: 320 ins; 311 del; 7 mod 8290027: Move inline functions from vm_version_x86.hpp to cpp Reviewed-by: kbarrett, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/9431 From iklam at openjdk.org Mon Jul 11 05:27:39 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 11 Jul 2022 05:27:39 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 23:27:31 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > renamed to os_linux_impl.hpp I like the idea of using namespaces. I think the private methods in os.hpp can be moved to share/runtime/osImpl.hpp, which should be included only by the os*.cpp files. Let me try to do a prototype of this. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From dholmes at openjdk.org Mon Jul 11 06:31:49 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 11 Jul 2022 06:31:49 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 05:24:31 GMT, Ioi Lam wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> renamed to os_linux_impl.hpp > > I like the idea of using namespaces. I think the private methods in os.hpp can be moved to share/runtime/osImpl.hpp, which should be included only by the os*.cpp files. Let me try to do a prototype of this. @iklam I thought we very recently rejected use of namespaces due to the visibility issue? Call me old fashioned by I much prefer classes for defining interfaces. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From stuefe at openjdk.org Mon Jul 11 06:53:28 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 11 Jul 2022 06:53:28 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 06:27:57 GMT, David Holmes wrote: > > @iklam I thought we very recently rejected use of namespaces due to the visibility issue? > > Call me old fashioned by I much prefer classes for defining interfaces. We really should rethink this rule. I find it baffling that we include a many modern C++ features (e.g. template metaprogramming, which is often a mixed blessing) but still are avoiding simple namespace. Had a read over the hotspot style guide. It cites advantages of AllStatic vs namespace: 1 Provides access control for members, which is unavailable with namespaces. 2 Avoids [Argument Dependent Lookup][ADL] (ADL). 3 Closed for additional members. Namespaces allow names to be added in multiple contexts, making it harder to see the complete API. I cannot comment off-hand about the ADL issue (2), but the other two points are actually advantages of namespaces, not disadvantages. At least when it comes to things like "os": - I don't _want_ access control for members in a public interface. A public interface should be clean and minimal. Allowing private members in an interface tempts devs into adding private implementation details, which have no place in a public header. That leads to a lot of unnecessary include dependencies. Using namespace OTOH would be inducive to clean interface design. - (3) is actually the biggest advantage. I can extend the os namespace without having to change the central header. Which, as I have argued before, leads to a much cleaner include file structure. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From dholmes at openjdk.org Mon Jul 11 08:05:52 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 11 Jul 2022 08:05:52 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: <5eYjVTQ80mneW6E8H_C33_lUYGbTYC60q5W5hy8WPWc=.ea0527b3-baf7-483c-9f79-30fce4b95662@github.com> On Mon, 11 Jul 2022 06:48:55 GMT, Thomas Stuefe wrote: > - I don't _want_ access control for members in a public interface. A public interface should be clean and minimal. Allowing private members in an interface tempts devs into adding private implementation details, which have no place in a public header. You are basically arguing against C++ class-based design there. C++ classes define both the public and non-public interfaces of a class. Header files include the full class definition. Ergo public header files have private implementation details. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From ksakata at openjdk.org Mon Jul 11 08:05:54 2022 From: ksakata at openjdk.org (Koichi Sakata) Date: Mon, 11 Jul 2022 08:05:54 GMT Subject: RFR: 8280472: Don't mix legacy logging with UL In-Reply-To: References: Message-ID: On Thu, 16 Jun 2022 01:39:33 GMT, Koichi Sakata wrote: > This PR remove extra conditions related to Unified Logging. > > Those conditions have been left after the transition to Unified Logging. This is an only place that uses UL, Verbose and WizardMode flags together. This JBS issue suggests to remove those flags. > > # Details > At present to output target log messages needs the debug build of OpenJDK and, Verbose or WizardMode option. > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -XX:+Verbose -version > (Omitted) > [0.090s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.090s][info][methodhandles] {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > [0.090s][info][methodhandles] - method holder: public synchronized abstract 'java/lang/invoke/MethodHandle' > (Omitted) > [0.090s][info][methodhandles] - signature handler: 0x0000000000000000 > [0.090s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > > Target log messages are from `{method}` to `- signature handler`. > > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.134s][info][methodhandles] lookup_polymorphic_method linkToStatic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; => basic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.134s][info][methodhandles] - this oop: 0x000000012abd5e58 > > When those flags are off, UL doesn't output them. > > # Test > There is no test code for it. So I built and run OpenJDK to confirm log output by myself. > > ## Run with Log Level DEBUG After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=debug -version > (Omitted) > [0.132s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.132s][debug][methodhandles] {method} > [0.132s][debug][methodhandles] - this oop: 0x00000001217d6b98 > (Omitted) > [0.132s][debug][methodhandles] - signature handler: 0x0000000000000000 > [0.132s][info ][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.132s][info ][methodhandles] - this oop: 0x00000001217d6b98 > > UL outputted target log messages with the debug level. It was successful. > > ## Run with Log Level INFO After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.086s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.086s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.086s][info][methodhandles] - this oop: 0x000000011f7d69f8 > > UL didn't output them. That was as I intended. Would somebody please sponsor this pull request? ------------- PR: https://git.openjdk.org/jdk/pull/9175 From dholmes at openjdk.org Mon Jul 11 08:40:46 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 11 Jul 2022 08:40:46 GMT Subject: RFR: 8280472: Don't mix legacy logging with UL In-Reply-To: References: Message-ID: On Thu, 16 Jun 2022 01:39:33 GMT, Koichi Sakata wrote: > This PR remove extra conditions related to Unified Logging. > > Those conditions have been left after the transition to Unified Logging. This is an only place that uses UL, Verbose and WizardMode flags together. This JBS issue suggests to remove those flags. > > # Details > At present to output target log messages needs the debug build of OpenJDK and, Verbose or WizardMode option. > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -XX:+Verbose -version > (Omitted) > [0.090s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.090s][info][methodhandles] {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > [0.090s][info][methodhandles] - method holder: public synchronized abstract 'java/lang/invoke/MethodHandle' > (Omitted) > [0.090s][info][methodhandles] - signature handler: 0x0000000000000000 > [0.090s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > > Target log messages are from `{method}` to `- signature handler`. > > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.134s][info][methodhandles] lookup_polymorphic_method linkToStatic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; => basic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.134s][info][methodhandles] - this oop: 0x000000012abd5e58 > > When those flags are off, UL doesn't output them. > > # Test > There is no test code for it. So I built and run OpenJDK to confirm log output by myself. > > ## Run with Log Level DEBUG After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=debug -version > (Omitted) > [0.132s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.132s][debug][methodhandles] {method} > [0.132s][debug][methodhandles] - this oop: 0x00000001217d6b98 > (Omitted) > [0.132s][debug][methodhandles] - signature handler: 0x0000000000000000 > [0.132s][info ][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.132s][info ][methodhandles] - this oop: 0x00000001217d6b98 > > UL outputted target log messages with the debug level. It was successful. > > ## Run with Log Level INFO After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.086s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.086s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.086s][info][methodhandles] - this oop: 0x000000011f7d69f8 > > UL didn't output them. That was as I intended. Sorry I didn't see your integration request. But hotspot changes require two reviews, so we are still waiting for a second reviewer. I will see if I can get someone to do so. ------------- PR: https://git.openjdk.org/jdk/pull/9175 From mgronlun at openjdk.org Mon Jul 11 09:09:32 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 11 Jul 2022 09:09:32 GMT Subject: RFR: 8280472: Don't mix legacy logging with UL In-Reply-To: References: Message-ID: On Thu, 16 Jun 2022 01:39:33 GMT, Koichi Sakata wrote: > This PR remove extra conditions related to Unified Logging. > > Those conditions have been left after the transition to Unified Logging. This is an only place that uses UL, Verbose and WizardMode flags together. This JBS issue suggests to remove those flags. > > # Details > At present to output target log messages needs the debug build of OpenJDK and, Verbose or WizardMode option. > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -XX:+Verbose -version > (Omitted) > [0.090s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.090s][info][methodhandles] {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > [0.090s][info][methodhandles] - method holder: public synchronized abstract 'java/lang/invoke/MethodHandle' > (Omitted) > [0.090s][info][methodhandles] - signature handler: 0x0000000000000000 > [0.090s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > > Target log messages are from `{method}` to `- signature handler`. > > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.134s][info][methodhandles] lookup_polymorphic_method linkToStatic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; => basic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.134s][info][methodhandles] - this oop: 0x000000012abd5e58 > > When those flags are off, UL doesn't output them. > > # Test > There is no test code for it. So I built and run OpenJDK to confirm log output by myself. > > ## Run with Log Level DEBUG After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=debug -version > (Omitted) > [0.132s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.132s][debug][methodhandles] {method} > [0.132s][debug][methodhandles] - this oop: 0x00000001217d6b98 > (Omitted) > [0.132s][debug][methodhandles] - signature handler: 0x0000000000000000 > [0.132s][info ][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.132s][info ][methodhandles] - this oop: 0x00000001217d6b98 > > UL outputted target log messages with the debug level. It was successful. > > ## Run with Log Level INFO After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.086s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.086s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.086s][info][methodhandles] - this oop: 0x000000011f7d69f8 > > UL didn't output them. That was as I intended. Marked as reviewed by mgronlun (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9175 From ksakata at openjdk.org Mon Jul 11 09:26:44 2022 From: ksakata at openjdk.org (Koichi Sakata) Date: Mon, 11 Jul 2022 09:26:44 GMT Subject: Integrated: 8280472: Don't mix legacy logging with UL In-Reply-To: References: Message-ID: <-Ew0OrvK7OATEDZgVnSVtFOU3hPNmf81PA8FJ9mz78c=.dd50a93a-b2a4-43eb-bd74-ae33c552d382@github.com> On Thu, 16 Jun 2022 01:39:33 GMT, Koichi Sakata wrote: > This PR remove extra conditions related to Unified Logging. > > Those conditions have been left after the transition to Unified Logging. This is an only place that uses UL, Verbose and WizardMode flags together. This JBS issue suggests to remove those flags. > > # Details > At present to output target log messages needs the debug build of OpenJDK and, Verbose or WizardMode option. > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -XX:+Verbose -version > (Omitted) > [0.090s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.090s][info][methodhandles] {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > [0.090s][info][methodhandles] - method holder: public synchronized abstract 'java/lang/invoke/MethodHandle' > (Omitted) > [0.090s][info][methodhandles] - signature handler: 0x0000000000000000 > [0.090s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > > Target log messages are from `{method}` to `- signature handler`. > > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.134s][info][methodhandles] lookup_polymorphic_method linkToStatic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; => basic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.134s][info][methodhandles] - this oop: 0x000000012abd5e58 > > When those flags are off, UL doesn't output them. > > # Test > There is no test code for it. So I built and run OpenJDK to confirm log output by myself. > > ## Run with Log Level DEBUG After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=debug -version > (Omitted) > [0.132s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.132s][debug][methodhandles] {method} > [0.132s][debug][methodhandles] - this oop: 0x00000001217d6b98 > (Omitted) > [0.132s][debug][methodhandles] - signature handler: 0x0000000000000000 > [0.132s][info ][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.132s][info ][methodhandles] - this oop: 0x00000001217d6b98 > > UL outputted target log messages with the debug level. It was successful. > > ## Run with Log Level INFO After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.086s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.086s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.086s][info][methodhandles] - this oop: 0x000000011f7d69f8 > > UL didn't output them. That was as I intended. This pull request has now been integrated. Changeset: 2579373d Author: Koichi Sakata Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/2579373dd0cc151dad22e4041f42bbd314b3be5f Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8280472: Don't mix legacy logging with UL Reviewed-by: dholmes, mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/9175 From tschatzl at openjdk.org Mon Jul 11 09:27:00 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Jul 2022 09:27:00 GMT Subject: RFR: 8289137: Automatically adapt Young/OldPLABSize and when setting only MinTLABSize [v3] In-Reply-To: References: Message-ID: > Hi all, > > can I get reviews for this enhancement fixing a sometimes annoying UI issue where setting `-XX:MinTLABSize` does not automatically update `-XX:YoungPLABSize` and `-XX:OldPLABSize` if they are not set? > This avoids some unnecessary retries. > > Testing: gha, test case > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: iwalulya review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9425/files - new: https://git.openjdk.org/jdk/pull/9425/files/7d7382bc..af272e30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9425&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9425&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9425.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9425/head:pull/9425 PR: https://git.openjdk.org/jdk/pull/9425 From stuefe at openjdk.org Mon Jul 11 09:29:43 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 11 Jul 2022 09:29:43 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: <5eYjVTQ80mneW6E8H_C33_lUYGbTYC60q5W5hy8WPWc=.ea0527b3-baf7-483c-9f79-30fce4b95662@github.com> References: <5eYjVTQ80mneW6E8H_C33_lUYGbTYC60q5W5hy8WPWc=.ea0527b3-baf7-483c-9f79-30fce4b95662@github.com> Message-ID: On Mon, 11 Jul 2022 08:01:42 GMT, David Holmes wrote: > > * I don't _want_ access control for members in a public interface. A public interface should be clean and minimal. Allowing private members in an interface tempts devs into adding private implementation details, which have no place in a public header. > > You are basically arguing against C++ class-based design there. C++ classes define both the public and non-public interfaces of a class. Header files include the full class definition. Ergo public header files have private implementation details. No, I'm fine with C++ class based design. But AllStatic is just a way to group a bunch of global functions together. That has nothing to do with class-based design, these classes are never instantiated. That is just name scoping. Could have done the same with a common prefix. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From iwalulya at openjdk.org Mon Jul 11 11:06:39 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 11 Jul 2022 11:06:39 GMT Subject: RFR: 8289137: Automatically adapt Young/OldPLABSize and when setting only MinTLABSize [v3] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 09:27:00 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this enhancement fixing a sometimes annoying UI issue where setting `-XX:MinTLABSize` does not automatically update `-XX:YoungPLABSize` and `-XX:OldPLABSize` if they are not set? >> This avoids some unnecessary retries. >> >> Testing: gha, test case >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review Lgtm! ------------- Marked as reviewed by iwalulya (Reviewer). PR: https://git.openjdk.org/jdk/pull/9425 From dholmes at openjdk.org Mon Jul 11 11:14:48 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 11 Jul 2022 11:14:48 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 23:27:31 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > renamed to os_linux_impl.hpp AllStatic is just a way to say that a Class has all static methods - no per-instance state. No different IMO to a singleton class, just without the overhead of creating an instance. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From dnsimon at openjdk.org Mon Jul 11 11:22:15 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 11 Jul 2022 11:22:15 GMT Subject: RFR: 8290075: [JVMCI] only blessed methods can link against EventWriterFactory.getEventWriter Message-ID: [JDK-8282420](https://bugs.openjdk.org/browse/JDK-8282420) introduced the notion of "blessed methods" which are those that can link against `jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long)`. This PR enhances the JVMCI ConstantPool API so that it can take a caller context when resolving a method to enforce this constraint properly. ------------- Commit messages: - support special linkage rules for jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long) in JVMCI Changes: https://git.openjdk.org/jdk/pull/9449/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9449&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290075 Stats: 144 lines in 11 files changed: 126 ins; 6 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/9449.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9449/head:pull/9449 PR: https://git.openjdk.org/jdk/pull/9449 From dnsimon at openjdk.org Mon Jul 11 11:22:15 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 11 Jul 2022 11:22:15 GMT Subject: RFR: 8290075: [JVMCI] only blessed methods can link against EventWriterFactory.getEventWriter In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 11:12:34 GMT, Doug Simon wrote: > [JDK-8282420](https://bugs.openjdk.org/browse/JDK-8282420) introduced the notion of "blessed methods" which are those that can link against `jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long)`. > This PR enhances the JVMCI ConstantPool API so that it can take a caller context when resolving a method to enforce this constraint properly. test/jdk/jdk/jfr/jvm/TestGetEventWriter.java line 60: > 58: * @run main/othervm jdk.jfr.jvm.TestGetEventWriter > 59: * > 60: * @run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Dtest.jvmci=true --add-exports=jdk.jfr/jdk.jfr.internal.event=ALL-UNNAMED I think `--add-exports=jdk.jfr/jdk.jfr.internal.event=ALL-UNNAMED` should be added to all `@run` specs in this test. See more details [here](https://github.com/openjdk/jdk/pull/8383#discussion_r915755771). ------------- PR: https://git.openjdk.org/jdk/pull/9449 From richard.reingruber at sap.com Mon Jul 11 12:07:50 2022 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 11 Jul 2022 12:07:50 +0000 Subject: State of the ppc64le port of JEP 425: Virtual Threads (Preview) In-Reply-To: References: Message-ID: Hi, the port passes now the basic continuation tests jdk/jdk/internal/vm/Continuation/Basic.java with UseContinuationFastPath disabled. Actually all tests in hotspot_loom and jdk_loom succeed except for 2 of them where the held monitor count is wrong (potentially caused by [1]) and another one with a method that is not compileable. The current version can again be found here: https://github.com/reinrich/loom/commits/ppc_port Richard. [1] https://bugs.openjdk.org/browse/JDK-8286957 From: Reingruber, Richard Date: Thursday, 2. June 2022 at 13:38 To: jdk-dev , porters-dev at openjdk.java.net , loom-dev at openjdk.java.net Subject: State of the ppc64le port of JEP 425: Virtual Threads (Preview) Hi, I learned today that preview features _must_ be implemented by a port in an OpenJDK release [1]. Unfortunately I have to inform you that I don't think the ppc64le port I'm currently working on will be ready in the JDK19 time frame. When I started the work (Jan. or Dec. I think) I expected to finish it before summer. Even after the last status update [2] I thought I could make it. But with the difficulties I still experience and being 6-8 weeks out of office in summer it is now rather unlikely. And until this morning myself (and actually also my colleagues) assumed this would only be a minor issue. Current Status of the Port: * UseContinuationFastPath is disabled * Basic tests where sequences of interpreted and compiled frames with quite some variations are frozen and thawed succeed. * GC with stack chunks on the java heap succeed. * Basic exception handling tests succeed. * Basic tests exercising compiled java calls with stack arguments succeed but need to be revisited because there are issues. [3] is a selection of test cases that I use in development. [4] is the most recent version of the ppc64le port Main Technical Problems * Shared code makes use of the 'unextended sp' of java frames. This breaks the platform abstraction as it makes assumptions on where to find, e.g., stack arguments relative to the unextended sp. * There are non-obvious interdependencies in the code which make it difficult to fix an issue. In an attempt to fix a problem I often have regressions because I missed adaptations of dependent parts. And then it it is extremely tedious to find the cause of the regression running tests and analyzing very long trace output. * Currently I see that the handling of stack arguments of compiled java methods works in quite some cases (see [3]) but there are cases where it doesn't. Trying alternative approaches means going through the tedious and time consuming process described above. * Lack of documentation. Heavily templatized implementation. These problems (except the last) could not be foreseen. From a high level the port simply needs to copy frames between stack and heap and provide some assembler glue code. As I know now it is actually a high effort to get the deatils tuned right. Thanks, Richard. [1] Ports _must_ implement preview features in thread "What should the relationship between ports and developers of large projects be?" https://mail.openjdk.java.net/pipermail/jdk-dev/2022-May/006635.html [2] State of the ppc64le loom port as of April 14 https://mail.openjdk.java.net/pipermail/loom-dev/2022-April/004197.html [3] BasicExp.java tests driving development of the port https://github.com/reinrich/loom/blob/3286bc8b72401dbccac59c994919fc425a51cb52/test/jdk/jdk/internal/vm/Continuation/BasicExp.java [4] Most recent version of the ppc64le loom port https://github.com/reinrich/loom/commits/ppc_port -------------- next part -------------- An HTML attachment was scrubbed... URL: From rschmelter at openjdk.org Mon Jul 11 12:11:41 2022 From: rschmelter at openjdk.org (Ralf Schmelter) Date: Mon, 11 Jul 2022 12:11:41 GMT Subject: RFR: 8289745: JfrStructCopyFailed uses heap words instead of bytes for object sizes In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:36:26 GMT, Thomas Stuefe wrote: >> The values for smallestSize, firstSize and totalSize in the CopyFailed type are set as the number of heap words, but should be number of bytes. This leads to wrong values in the PromotionFailed and EvacuationFailed JFR events containing this type. > > test/jdk/jdk/jfr/event/gc/detailed/PromotionFailedEvent.java line 56: > >> 54: System.out.println("Event: " + event); >> 55: long smallestSize = Events.assertField(event, "promotionFailed.smallestSize").atLeast(1L).getValue(); >> 56: Asserts.assertTrue((smallestSize % minObjectAlignment) == 0, "smallestSize " + smallestSize + " is not a valid size."); > > Testing for alignment is a good pragmatic way to check for regressions without adding more logic. > > Do the numbers include object headers? If yes, we could assert to >= 8 at least. Yes, the header is included. It is the size we would use if we iterate over a contiguous region the heap. We check that the size is > 0 in the assert above, so it has to be at least 8. ------------- PR: https://git.openjdk.org/jdk/pull/9378 From jwilhelm at openjdk.org Mon Jul 11 12:46:47 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Mon, 11 Jul 2022 12:46:47 GMT Subject: RFR: Merge jdk19 Message-ID: Forwardport JDK 19 -> JDK 20 ------------- Commit messages: - Merge - 8290004: [PPC64] JfrGetCallTrace: assert(_pc != nullptr) failed: must have PC - 8289692: JFR: Thread checkpoint no longer enforce mutual exclusion post Loom integration - 8289894: A NullPointerException thrown from guard expression - 8289729: G1: Incorrect verification logic in G1ConcurrentMark::clear_next_bitmap - 8282071: Update java.xml module-info - 8290033: ProblemList serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java on windows-x64 in -Xcomp mode - 8289697: buffer overflow in MTLVertexCache.m: MTLVertexCache_AddGlyphQuad - 8289872: wrong wording in @param doc for HashMap.newHashMap et. al. - 8289601: SegmentAllocator::allocateUtf8String(String str) should be clarified for strings containing \0 - ... and 5 more: https://git.openjdk.org/jdk/compare/46251bc6...0b0d186f The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=jdk&pr=9450&range=00.0 - jdk19: https://webrevs.openjdk.org/?repo=jdk&pr=9450&range=00.1 Changes: https://git.openjdk.org/jdk/pull/9450/files Stats: 411 lines in 43 files changed: 284 ins; 26 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/9450.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9450/head:pull/9450 PR: https://git.openjdk.org/jdk/pull/9450 From jbhateja at openjdk.org Mon Jul 11 13:01:22 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Jul 2022 13:01:22 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation Message-ID: - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). Please review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8290066: Remove KNL specific handling for new CPU feature IR annotations Changes: https://git.openjdk.org/jdk/pull/9452/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9452&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290066 Stats: 192 lines in 5 files changed: 83 ins; 108 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9452.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9452/head:pull/9452 PR: https://git.openjdk.org/jdk/pull/9452 From coleenp at openjdk.org Mon Jul 11 13:11:31 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 11 Jul 2022 13:11:31 GMT Subject: RFR: 8275662: remove test/lib/sun/hotspot In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 19:46:17 GMT, Coleen Phillimore wrote: > This change removes the last remnants of sun/hotspot/WhiteBox.java and other classes, and uses the versions in jdk/test/whitebox. > I used sed to change sun.hotspot.{gc,code,cpuinfo} to jdk.test.whitebox and deleted the old files and some references to sun.hotspot. > Tested with tier1-4. Thanks Serguei and Leonid. ------------- PR: https://git.openjdk.org/jdk/pull/9434 From coleenp at openjdk.org Mon Jul 11 13:11:31 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 11 Jul 2022 13:11:31 GMT Subject: Integrated: 8275662: remove test/lib/sun/hotspot In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 19:46:17 GMT, Coleen Phillimore wrote: > This change removes the last remnants of sun/hotspot/WhiteBox.java and other classes, and uses the versions in jdk/test/whitebox. > I used sed to change sun.hotspot.{gc,code,cpuinfo} to jdk.test.whitebox and deleted the old files and some references to sun.hotspot. > Tested with tier1-4. This pull request has now been integrated. Changeset: 0c370089 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/0c37008917789e7b631b5c18e6f54454b1bfe038 Stats: 1484 lines in 99 files changed: 0 ins; 1367 del; 117 mod 8275662: remove test/lib/sun/hotspot Reviewed-by: mseledtsov, sspitsyn, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/9434 From dnsimon at openjdk.org Mon Jul 11 13:26:34 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 11 Jul 2022 13:26:34 GMT Subject: RFR: 8290075: [JVMCI] only blessed methods can link against EventWriterFactory.getEventWriter [v2] In-Reply-To: References: Message-ID: > [JDK-8282420](https://bugs.openjdk.org/browse/JDK-8282420) introduced the notion of "blessed methods" which are those that can link against `jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long)`. > This PR enhances the JVMCI ConstantPool API so that it can take a caller context when resolving a method to enforce this constraint properly. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: support special linkage rules for jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long) in JVMCI ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9449/files - new: https://git.openjdk.org/jdk/pull/9449/files/a294aac0..589f205f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9449&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9449&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9449.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9449/head:pull/9449 PR: https://git.openjdk.org/jdk/pull/9449 From mdoerr at openjdk.org Mon Jul 11 14:15:59 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 11 Jul 2022 14:15:59 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers Message-ID: Preserve volatile vector registers in ZGC C2 load barrier stub. ------------- Commit messages: - 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers Changes: https://git.openjdk.org/jdk/pull/9453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9453&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290082 Stats: 72 lines in 4 files changed: 27 ins; 2 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/9453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9453/head:pull/9453 PR: https://git.openjdk.org/jdk/pull/9453 From eosterlund at openjdk.org Mon Jul 11 14:26:46 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 11 Jul 2022 14:26:46 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 14:09:58 GMT, Martin Doerr wrote: > Preserve volatile vector registers in ZGC C2 load barrier stub. Looks good. Thanks Martin! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/9453 From mcimadamore at openjdk.org Mon Jul 11 14:33:12 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 11 Jul 2022 14:33:12 GMT Subject: [jdk19] Integrated: 8287809: Revisit implementation of memory session In-Reply-To: References: Message-ID: On Wed, 15 Jun 2022 18:06:44 GMT, Maurizio Cimadamore wrote: > This is a JDK 19 clone of: https://github.com/openjdk/jdk/pull/9017 This pull request has now been integrated. Changeset: fed3af8a Author: Maurizio Cimadamore URL: https://git.openjdk.org/jdk19/commit/fed3af8ae069fc760a24e750292acbb468b14ce5 Stats: 429 lines in 21 files changed: 47 ins; 102 del; 280 mod 8287809: Revisit implementation of memory session Reviewed-by: jvernee ------------- PR: https://git.openjdk.org/jdk19/pull/22 From mdoerr at openjdk.org Mon Jul 11 15:36:35 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 11 Jul 2022 15:36:35 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v2] In-Reply-To: References: Message-ID: > Preserve volatile vector registers in ZGC C2 load barrier stub. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Avoid using more than the volatile program storage (288 Bytes) on stack below the SP. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9453/files - new: https://git.openjdk.org/jdk/pull/9453/files/a1ca2ea5..bb0513c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9453&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9453&range=00-01 Stats: 29 lines in 1 file changed: 11 ins; 5 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/9453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9453/head:pull/9453 PR: https://git.openjdk.org/jdk/pull/9453 From mdoerr at openjdk.org Mon Jul 11 15:36:37 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 11 Jul 2022 15:36:37 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 14:09:58 GMT, Martin Doerr wrote: > Preserve volatile vector registers in ZGC C2 load barrier stub. Thanks for the prompt review! I have noticed that the first version may use more space below SP than allowed by ABI. 288 Bytes below SP are "volatile program storage", but we may use more when including the vector registers. I had to change the save & restore sequence a bit. ------------- PR: https://git.openjdk.org/jdk/pull/9453 From kvn at openjdk.org Mon Jul 11 16:41:43 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Jul 2022 16:41:43 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation In-Reply-To: References: Message-ID: <8O41Cd2Okto_f1iAJobk-wn6ohUBO2FzNkN2rgtFVzM=.83a57d23-1694-45b1-a893-c42e2adc8be0@github.com> On Mon, 11 Jul 2022 12:55:02 GMT, Jatin Bhateja wrote: > - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. > - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). > > Please review and share your feedback. > > Best Regards, > Jatin test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 138: > 136: "Xlog", > 137: "UseAVX", > 138: "UseKNLSetting", I don't think we should add these flags to whitelist in these changes - they can affect generated code. New RFE is filed already to handle such flags: [8289801](https://bugs.openjdk.org/browse/JDK-8289801) ------------- PR: https://git.openjdk.org/jdk/pull/9452 From aph at openjdk.org Mon Jul 11 16:39:41 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 11 Jul 2022 16:39:41 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: On Sun, 10 Jul 2022 23:16:42 GMT, David Holmes wrote: > There are failures in tier1, 2 and 3 (ZGC use is in tier3). Most failures on Macos. > > Failing tests: > > compiler/codecache/stress/RandomAllocationTest.java compiler/codegen/TestOopCmp.java compiler/unsafe/UnsafeGetConstantField.java runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java#id4 Got it. This is a corner case, triggered when we have a reloc of the form `adrp; movk; add` or `adrp; movk, {ld,st}r[reg, #ofs]`. It's a pre-existing bug which never has had any effect because this form is only ever used when the target is a fixed address. The assertion is new. I'm running tier1 again now on MacOS. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From jwilhelm at openjdk.org Mon Jul 11 16:19:55 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Mon, 11 Jul 2022 16:19:55 GMT Subject: Integrated: Merge jdk19 In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 12:38:11 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 19 -> JDK 20 This pull request has now been integrated. Changeset: c79baaa8 Author: Jesper Wilhelmsson URL: https://git.openjdk.org/jdk/commit/c79baaa811971c43fbdbc251482d0e40903588cc Stats: 411 lines in 43 files changed: 284 ins; 26 del; 101 mod Merge ------------- PR: https://git.openjdk.org/jdk/pull/9450 From aph at openjdk.org Mon Jul 11 16:33:50 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 11 Jul 2022 16:33:50 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: References: Message-ID: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/ee6e4189..60693a49 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=00-01 Stats: 65 lines in 1 file changed: 47 ins; 10 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From jbhateja at openjdk.org Mon Jul 11 21:03:30 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Jul 2022 21:03:30 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: > - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. > - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). > > Please review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8290066: Removing newly added white listed options. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9452/files - new: https://git.openjdk.org/jdk/pull/9452/files/923abfd0..c7036cde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9452&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9452&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9452.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9452/head:pull/9452 PR: https://git.openjdk.org/jdk/pull/9452 From jbhateja at openjdk.org Mon Jul 11 21:03:30 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Jul 2022 21:03:30 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: <8O41Cd2Okto_f1iAJobk-wn6ohUBO2FzNkN2rgtFVzM=.83a57d23-1694-45b1-a893-c42e2adc8be0@github.com> References: <8O41Cd2Okto_f1iAJobk-wn6ohUBO2FzNkN2rgtFVzM=.83a57d23-1694-45b1-a893-c42e2adc8be0@github.com> Message-ID: On Mon, 11 Jul 2022 16:38:12 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8290066: Removing newly added white listed options. > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 138: > >> 136: "Xlog", >> 137: "UseAVX", >> 138: "UseKNLSetting", > > I don't think we should add these flags to whitelist in these changes - they can affect generated code. > New RFE is filed already to handle such flags: [8289801](https://bugs.openjdk.org/browse/JDK-8289801) Done. ------------- PR: https://git.openjdk.org/jdk/pull/9452 From dlong at openjdk.org Mon Jul 11 22:43:02 2022 From: dlong at openjdk.org (Dean Long) Date: Mon, 11 Jul 2022 22:43:02 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v2] In-Reply-To: References: Message-ID: <_TCVGOPioj8dj6ZJkBxBOj2S26Z5SrVI4oQ4Jxg_bG8=.941452c4-e0b8-4161-b09c-359e74c7cfb0@github.com> On Mon, 11 Jul 2022 15:36:35 GMT, Martin Doerr wrote: >> Preserve volatile vector registers in ZGC C2 load barrier stub. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Avoid using more than the volatile program storage (288 Bytes) on stack below the SP. Does this need to be fixed in jdk19? ------------- PR: https://git.openjdk.org/jdk/pull/9453 From kvn at openjdk.org Mon Jul 11 23:49:43 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 11 Jul 2022 23:49:43 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 21:03:30 GMT, Jatin Bhateja wrote: >> - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. >> - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8290066: Removing newly added white listed options. Good. I will start testing. ------------- PR: https://git.openjdk.org/jdk/pull/9452 From mdoerr at openjdk.org Tue Jul 12 04:56:26 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 12 Jul 2022 04:56:26 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 15:36:35 GMT, Martin Doerr wrote: >> Preserve volatile vector registers in ZGC C2 load barrier stub. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Avoid using more than the volatile program storage (288 Bytes) on stack below the SP. Would be nice to have in 19, but it doesn't apply cleanly. There is a workaround. I prefer avoiding merging work for Oracle employees. We need it in 17u and 21 LTS. ------------- PR: https://git.openjdk.org/jdk/pull/9453 From dholmes at openjdk.org Tue Jul 12 05:42:30 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 12 Jul 2022 05:42:30 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> Message-ID: On Mon, 11 Jul 2022 16:33:50 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic I re-ran our tests and they are now passing. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From rrich at openjdk.org Tue Jul 12 07:15:45 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 12 Jul 2022 07:15:45 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 15:36:35 GMT, Martin Doerr wrote: >> Preserve volatile vector registers in ZGC C2 load barrier stub. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Avoid using more than the volatile program storage (288 Bytes) on stack below the SP. src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp line 486: > 484: assert(SuperwordUseVSX, "or should not reach here"); > 485: VectorSRegister vs_reg = vm_reg->as_VectorSRegister(); > 486: if (vs_reg->encoding() >= VSR32->encoding() && vs_reg->encoding() <= VSR51->encoding()) { Why VSR32 as lower bound? I read in ppc.ad 1st 32 VSRs are aliases for the FPRs wich are already defined above. Could you please help and explain what this means? Why VSR51 as upper bound? I'd suggest to update the comment in register_ppc.hpp and explain the vector scalar registers. What is the difference between vector and vector scalar registers? ------------- PR: https://git.openjdk.org/jdk/pull/9453 From mdoerr at openjdk.org Tue Jul 12 08:06:41 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 12 Jul 2022 08:06:41 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v2] In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 07:13:42 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid using more than the volatile program storage (288 Bytes) on stack below the SP. > > src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp line 486: > >> 484: assert(SuperwordUseVSX, "or should not reach here"); >> 485: VectorSRegister vs_reg = vm_reg->as_VectorSRegister(); >> 486: if (vs_reg->encoding() >= VSR32->encoding() && vs_reg->encoding() <= VSR51->encoding()) { > > Why VSR32 as lower bound? I read in ppc.ad > > 1st 32 VSRs are aliases for the FPRs wich are already defined above. > > Could you please help and explain what this means? > > Why VSR51 as upper bound? > > I'd suggest to update the comment in register_ppc.hpp and explain the vector scalar registers. > What is the difference between vector and vector scalar registers? Thanks for looking at it! VSRs are not separate registers. They contain the regular FPRs (mapped to 0-31) and VRs (mapped to 32-63). FPRs are managed separately while the VRs are not defined elsewhere in the ppc.ad file. There are instructions which operate on VSRs and can access FPRs and VRs. This was tricky to implement in hotspot ([JDK-8188139](https://bugs.openjdk.org/browse/JDK-8188139) and many follow-up fixes). Only the VRs VR0-VR19 are volatile (see register_ppc.hpp), so only these ones need spilling. (Same is done for other register types.) VR0-VR19 = VSR32-VSR51 Note that only these ones are currently used by C2 (see `reg_class vs_reg` in ppc.ad). Reason is that we currently don't preserve the non-volatile ones in the Java entry frame. ------------- PR: https://git.openjdk.org/jdk/pull/9453 From fyang at openjdk.org Tue Jul 12 08:23:35 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Jul 2022 08:23:35 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 Message-ID: Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. Test: hotspot-tier1 & jdk-tier1 with QEMU. ------------- Commit messages: - JDK-8290137: riscv: small refactoring for add_memory_int32/64 Changes: https://git.openjdk.org/jdk/pull/9461/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9461&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290137 Stats: 87 lines in 8 files changed: 40 ins; 0 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/9461.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9461/head:pull/9461 PR: https://git.openjdk.org/jdk/pull/9461 From yadongwang at openjdk.org Tue Jul 12 08:46:42 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Tue, 12 Jul 2022 08:46:42 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: References: Message-ID: <59-2KbKaiqBq8CjPKZ3wP70yjJHNOC8Lb8UBBHXF3i4=.abc431f5-ee0a-4099-9903-245b4e31ee22@github.com> On Tue, 12 Jul 2022 08:14:05 GMT, Fei Yang wrote: > Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. > > We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. > > Test: hotspot-tier1 & jdk-tier1 with QEMU. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2965: > 2963: assert(!adr.uses(t0), "invalid dst for address increment"); > 2964: ld(t0, adr); > 2965: add(t0, t0, value, t1); It's not safe to clobber t1 sometimes. And I think it's better to limit the accepted value to 12 bits or less. ------------- PR: https://git.openjdk.org/jdk/pull/9461 From rrich at openjdk.org Tue Jul 12 09:04:42 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 12 Jul 2022 09:04:42 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v2] In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 08:02:51 GMT, Martin Doerr wrote: > Thanks for looking at it! VSRs are not separate registers. They contain the > regular FPRs (mapped to 0-31) and VRs (mapped to 32-63). FPRs are managed > separately while the VRs are not defined elsewhere in the ppc.ad file. Thanks. I think this should be better explained in register_ppc.hpp. > There are instructions which operate on VSRs and can access FPRs and VRs. This > was tricky to implement in hotspot > ([JDK-8188139](https://bugs.openjdk.org/browse/JDK-8188139) and many follow-up > fixes). Only the VRs VR0-VR19 are volatile (see register_ppc.hpp), so only > these ones need spilling. (Same is done for other register types.) VR0-VR19 = > VSR32-VSR51 > Note that only these ones are currently used by C2 (see `reg_class > vs_reg` in ppc.ad). Reason is that we currently don't preserve the > non-volatile ones in the Java entry frame. I see. VSR52-VSR64 are declared SOC in ppc.ad. Shouldn't they be SOE then? ------------- PR: https://git.openjdk.org/jdk/pull/9453 From fyang at openjdk.org Tue Jul 12 09:35:42 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Jul 2022 09:35:42 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: <59-2KbKaiqBq8CjPKZ3wP70yjJHNOC8Lb8UBBHXF3i4=.abc431f5-ee0a-4099-9903-245b4e31ee22@github.com> References: <59-2KbKaiqBq8CjPKZ3wP70yjJHNOC8Lb8UBBHXF3i4=.abc431f5-ee0a-4099-9903-245b4e31ee22@github.com> Message-ID: <1A665u7Rgv_VcLH7StRvH05H1GjEEhK3S4tKdfTecH4=.3fd7ea58-95c9-42b6-9ffc-f78bfd7d0a24@github.com> On Tue, 12 Jul 2022 08:42:45 GMT, Yadong Wang wrote: >> Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. >> >> We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. >> >> Test: hotspot-tier1 & jdk-tier1 with QEMU. > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2965: > >> 2963: assert(!adr.uses(t0), "invalid dst for address increment"); >> 2964: ld(t0, adr); >> 2965: add(t0, t0, value, t1); > > It's not safe to clobber t1 sometimes. And I think it's better to limit the accepted value to 12 bits or less. In fact, we still need to trash t1 here when we have some unexpected memory address. Callers should be aware of this. ------------- PR: https://git.openjdk.org/jdk/pull/9461 From mdoerr at openjdk.org Tue Jul 12 09:37:35 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 12 Jul 2022 09:37:35 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v3] In-Reply-To: References: Message-ID: > Preserve volatile vector registers in ZGC C2 load barrier stub. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Update SOE spec for VSR regs. Add comment to register_ppc.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9453/files - new: https://git.openjdk.org/jdk/pull/9453/files/bb0513c1..2d8fa980 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9453&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9453&range=01-02 Stats: 40 lines in 2 files changed: 4 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/9453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9453/head:pull/9453 PR: https://git.openjdk.org/jdk/pull/9453 From mdoerr at openjdk.org Tue Jul 12 09:43:54 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 12 Jul 2022 09:43:54 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v4] In-Reply-To: References: Message-ID: > Preserve volatile vector registers in ZGC C2 load barrier stub. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Update SOE spec for VSR regs. Add comment to register_ppc.hpp - Avoid using more than the volatile program storage (288 Bytes) on stack below the SP. - 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers ------------- Changes: https://git.openjdk.org/jdk/pull/9453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9453&range=03 Stats: 121 lines in 5 files changed: 41 ins; 6 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/9453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9453/head:pull/9453 PR: https://git.openjdk.org/jdk/pull/9453 From mdoerr at openjdk.org Tue Jul 12 09:46:16 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 12 Jul 2022 09:46:16 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v2] In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 09:00:44 GMT, Richard Reingruber wrote: >> Thanks for looking at it! >> VSRs are not separate registers. They contain the regular FPRs (mapped to 0-31) and VRs (mapped to 32-63). FPRs are managed separately while the VRs are not defined elsewhere in the ppc.ad file. There are instructions which operate on VSRs and can access FPRs and VRs. This was tricky to implement in hotspot ([JDK-8188139](https://bugs.openjdk.org/browse/JDK-8188139) and many follow-up fixes). >> Only the VRs VR0-VR19 are volatile (see register_ppc.hpp), so only these ones need spilling. (Same is done for other register types.) >> VR0-VR19 = VSR32-VSR51 >> Note that only these ones are currently used by C2 (see `reg_class vs_reg` in ppc.ad). Reason is that we currently don't preserve the non-volatile ones in the Java entry frame. > >> Thanks for looking at it! VSRs are not separate registers. They contain the >> regular FPRs (mapped to 0-31) and VRs (mapped to 32-63). FPRs are managed >> separately while the VRs are not defined elsewhere in the ppc.ad file. > > Thanks. I think this should be better explained in register_ppc.hpp. > >> There are instructions which operate on VSRs and can access FPRs and VRs. This >> was tricky to implement in hotspot >> ([JDK-8188139](https://bugs.openjdk.org/browse/JDK-8188139) and many follow-up >> fixes). Only the VRs VR0-VR19 are volatile (see register_ppc.hpp), so only >> these ones need spilling. (Same is done for other register types.) VR0-VR19 = >> VSR32-VSR51 >> Note that only these ones are currently used by C2 (see `reg_class >> vs_reg` in ppc.ad). Reason is that we currently don't preserve the >> non-volatile ones in the Java entry frame. > > I see. VSR52-VSR64 are declared SOC in ppc.ad. Shouldn't they be SOE then? I've added a comment to register_ppc.hpp. Right, they should be SOE. Changed. Note that this doesn't have any effect because the SOE registers are not allocated by C2. But should get fixed to avoid confusion and for possible future usage. ------------- PR: https://git.openjdk.org/jdk/pull/9453 From mdoerr at openjdk.org Tue Jul 12 09:54:59 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 12 Jul 2022 09:54:59 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v5] In-Reply-To: References: Message-ID: > Preserve volatile vector registers in ZGC C2 load barrier stub. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix typo in comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9453/files - new: https://git.openjdk.org/jdk/pull/9453/files/f6d238ed..fab3fa4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9453&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9453&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9453/head:pull/9453 PR: https://git.openjdk.org/jdk/pull/9453 From rrich at openjdk.org Tue Jul 12 09:55:00 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 12 Jul 2022 09:55:00 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v3] In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 09:37:35 GMT, Martin Doerr wrote: >> Preserve volatile vector registers in ZGC C2 load barrier stub. > > Martin Doerr has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. Thanks Martin, your changes looks good to me now. The commenting in register_ppc.hpp could still be improved though. E.g. the comment refers to `v` and `vs` registers but the declared names are `VR` and `VSR`. Probably the declared names should be changed but that's nothing to be done in this pr. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/9453 From mdoerr at openjdk.org Tue Jul 12 09:55:02 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 12 Jul 2022 09:55:02 GMT Subject: RFR: 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers [v4] In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 09:43:54 GMT, Martin Doerr wrote: >> Preserve volatile vector registers in ZGC C2 load barrier stub. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Update SOE spec for VSR regs. Add comment to register_ppc.hpp > - Avoid using more than the volatile program storage (288 Bytes) on stack below the SP. > - 8290082: [PPC64] ZGC C2 load barrier stub needs to preserve vector registers Thanks for the reviews! I just fixed a typo in a comment. ------------- PR: https://git.openjdk.org/jdk/pull/9453 From adinn at openjdk.org Tue Jul 12 10:10:43 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 12 Jul 2022 10:10:43 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> Message-ID: On Mon, 11 Jul 2022 16:33:50 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 78: > 76: // Patch any kind of instruction; there may be several instructions. > 77: // Return the total length (in bytes) of the instructions. > 78: int MacroAssembler::pd_patch_instruction_size(address branch, address target) { You really need to document what you are doing up front here by explaining what patchable sequences arise, categorizing them by leading insn and then explaining how that can categorization can be automated by means of a switch on certain insn bits plus, in some cases, auxiliary bit tests. I suggest the following as a lead-in comment: // Instruction sequences whose target may need to be retrieved or // patched can be distinguished by their leading instruction, // sorting them into three main instruction groups and // related subgroups. // // 1) Branch, Exception and System (insn count = 1) // 1a) Unconditional branch (immediate): // b/bl imm19 // 1b) Compare & branch (immediate): // cbz/cbnz Rt imm19 // 1c) Test & branch (immediate): // tbz/tbnz Rt imm14 // 1d) Conditional branch (immediate): // b.cond imm19 // // 2) Loads and Stores (insn count = 1) // 2a) Load register literal: // ldr Rt imm19 // // 3) Data Processing Immediate (insn count = 2 or 3) // 3a) PC-rel. addressing // adr/adrp Rx imm21; ldr/str Ry Rx #imm12 // adr/adrp Rx imm21; add Ry Rx #imm12 // adr/adrp Rx imm21; movk Rx #imm16<<32; ldr/str Ry, [Rx, #offset_in_page] // adr/adrp Rx imm21; movk Rx #imm16<<32; add Ry, Rx, #offset_in_page // adr/adrp Rx imm21; movk Rx #imm16<<32 // adr/adrp Rx imm21 // 3b) Move wide (immediate) // movz Rx #imm16; movk Rx #imm16 << 16; movk Rx #imm16 << 32; // // A switch on a subset of the instruction's bits provides an efficient // dispatch to these subcases. // // insn[28:26] -> main group ('x' == don't care) // 00x -> UNALLOCATED // 100 -> Data Processing Immediate // 101 -> Branch, Exception and System // x1x -> Loads and Stores // // insn[30:25] -> subgroup ('_' == group, 'x' == don't care). // n.b. in some cases extra bits need to be checked to verify the // instruction is as expected // // 1) ... xx101x Branch, Exception and System // 1a) 00___x Unconditional branch (immediate) // 1b) 01___0 Compare & branch (immediate) // 1c) 01___1 Test & branch (immediate) // 1d) 10___0 Conditional branch (immediate) // other Should not happen // // 2) ... xxx1x0 Loads and Stores // 2a) xx1_x_ Load/Store register // 2aa) x01_x_0 Load register literal (n.b. requires insn[24] == 0) // strictly should be 64 bit non-FP/SIMD i.e. // 0101_0_0 (i.e. requires insn[31:24] == 01011000) // // 3) ... xx100x Data Processing Immediate // 3a) xx___00 PC-rel. addressing (n.b. requires insn[24] == 0) // 3b) xx___101 Move wide (immediate) (n.b. requires insn[24:23] == 01) // strictly should be 64 bit movz #imm16<<0 // 110___10100 (i.e. requires insn[31:21] == 11010010100) // This means you no longer need most of the comments that occur inline in `pd_patch_instruction_size` and `target_addr_for_insn`. You can simply refer to cases 1a, 1b, ..., 2a etc. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From adinn at openjdk.org Tue Jul 12 10:14:45 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 12 Jul 2022 10:14:45 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> Message-ID: On Mon, 11 Jul 2022 16:33:50 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 84: > 82: // Return the total length (in bytes) of the instructions. > 83: int MacroAssembler::pd_patch_instruction_size(address insn_addr, address target) { > 84: int instructions = 1; I don't like the fact that you have to replicate this switch and associated validation logic in the two separate methods `pd_patch_instruction_size` and `target_addr_for_insn`. How about making both of them call a common auxiliary method which will either retrieve a target address or patch it? i.e. have `int MacroAssembler::pd_patch_instruction_size(address insn_addr, address target)` and `address MacroAssembler::target_addr_for_insn(address insn_addr, uint32_t insn)` both call method `int MacroAssembler::fetch_or_patch_target_addr(address insn_addr, uint32_t insn, address &target, boolean do_patch)` That means you can have one copy of the dispatch logic with common verification in each end case and variant action in each case depending on the value of `do_patch`. It also means the explanatory comment above only nees to exist in one place at the head of method `fetch_or_patch_target_addr` ------------- PR: https://git.openjdk.org/jdk/pull/9398 From adinn at openjdk.org Tue Jul 12 10:19:51 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 12 Jul 2022 10:19:51 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> Message-ID: On Tue, 12 Jul 2022 10:12:39 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> 8289743: AArch64: Clean up patching logic > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 84: > >> 82: // Return the total length (in bytes) of the instructions. >> 83: int MacroAssembler::pd_patch_instruction_size(address insn_addr, address target) { >> 84: int instructions = 1; > > I don't like the fact that you have to replicate this switch and associated validation logic in the two separate methods `pd_patch_instruction_size` and `target_addr_for_insn`. How about making both of them call a common auxiliary method which will either retrieve a target address or patch it? i.e. have > > `int MacroAssembler::pd_patch_instruction_size(address insn_addr, address target)` > and > `address MacroAssembler::target_addr_for_insn(address insn_addr, uint32_t insn)` > > both call method > > `int MacroAssembler::fetch_or_patch_target_addr(address insn_addr, uint32_t insn, address &target, boolean do_patch)` > > That means you can have one copy of the dispatch logic with common verification in each end case and variant action in each case depending on the value of `do_patch`. > > It also means the explanatory comment above only nees to exist in one place at the head of method `fetch_or_patch_target_addr` n.b. I forgot to explain but I hope it is obvious that `target` is passed as a reference so it can be an input when `do_patch == true` and an output when `do_patch == false`. The return value is the instruction count in either case. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From ksakata at openjdk.org Tue Jul 12 10:23:41 2022 From: ksakata at openjdk.org (Koichi Sakata) Date: Tue, 12 Jul 2022 10:23:41 GMT Subject: RFR: 8280472: Don't mix legacy logging with UL In-Reply-To: References: Message-ID: <46Pi-jLrSB95ROTkMBsaS8kBIghHMVdss5MDY6mdaIo=.96f82462-d483-411f-8a7a-d706442ef8aa@github.com> On Thu, 16 Jun 2022 01:39:33 GMT, Koichi Sakata wrote: > This PR remove extra conditions related to Unified Logging. > > Those conditions have been left after the transition to Unified Logging. This is an only place that uses UL, Verbose and WizardMode flags together. This JBS issue suggests to remove those flags. > > # Details > At present to output target log messages needs the debug build of OpenJDK and, Verbose or WizardMode option. > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -XX:+Verbose -version > (Omitted) > [0.090s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.090s][info][methodhandles] {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > [0.090s][info][methodhandles] - method holder: public synchronized abstract 'java/lang/invoke/MethodHandle' > (Omitted) > [0.090s][info][methodhandles] - signature handler: 0x0000000000000000 > [0.090s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.090s][info][methodhandles] - this oop: 0x00000001303d6238 > > Target log messages are from `{method}` to `- signature handler`. > > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.134s][info][methodhandles] lookup_polymorphic_method linkToStatic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; => basic (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.134s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.134s][info][methodhandles] - this oop: 0x000000012abd5e58 > > When those flags are off, UL doesn't output them. > > # Test > There is no test code for it. So I built and run OpenJDK to confirm log output by myself. > > ## Run with Log Level DEBUG After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=debug -version > (Omitted) > [0.132s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.132s][debug][methodhandles] {method} > [0.132s][debug][methodhandles] - this oop: 0x00000001217d6b98 > (Omitted) > [0.132s][debug][methodhandles] - signature handler: 0x0000000000000000 > [0.132s][info ][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.132s][info ][methodhandles] - this oop: 0x00000001217d6b98 > > UL outputted target log messages with the debug level. It was successful. > > ## Run with Log Level INFO After Applying This Patch > > $ jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java -Xlog:methodhandles=info -version > (Omitted) > [0.086s][info][methodhandles] make_method_handle_intrinsic MH.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object; > [0.086s][info][methodhandles] lookup_polymorphic_method => intrinsic {method} > [0.086s][info][methodhandles] - this oop: 0x000000011f7d69f8 > > UL didn't output them. That was as I intended. Thank you, David and Markus. I really appreciate it. ------------- PR: https://git.openjdk.org/jdk/pull/9175 From aph at openjdk.org Tue Jul 12 10:49:42 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 12 Jul 2022 10:49:42 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> Message-ID: On Tue, 12 Jul 2022 10:15:56 GMT, Andrew Dinn wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 84: >> >>> 82: // Return the total length (in bytes) of the instructions. >>> 83: int MacroAssembler::pd_patch_instruction_size(address insn_addr, address target) { >>> 84: int instructions = 1; >> >> I don't like the fact that you have to replicate this switch and associated validation logic in the two separate methods `pd_patch_instruction_size` and `target_addr_for_insn`. How about making both of them call a common auxiliary method which will either retrieve a target address or patch it? i.e. have >> >> `int MacroAssembler::pd_patch_instruction_size(address insn_addr, address target)` >> and >> `address MacroAssembler::target_addr_for_insn(address insn_addr, uint32_t insn)` >> >> both call method >> >> `int MacroAssembler::fetch_or_patch_target_addr(address insn_addr, uint32_t insn, address &target, boolean do_patch)` >> >> That means you can have one copy of the dispatch logic with common verification in each end case and variant action in each case depending on the value of `do_patch`. >> >> It also means the explanatory comment above only nees to exist in one place at the head of method `fetch_or_patch_target_addr` > > n.b. I forgot to explain but I hope it is obvious that `target` is passed as a reference so it can be an input when `do_patch == true` and an output when `do_patch == false`. The return value is the instruction count in either case. > I don't like the fact that you have to replicate this switch and associated validation logic in the two separate methods `pd_patch_instruction_size` and `target_addr_for_insn`. How about making both of them call a common auxiliary method which will either retrieve a target address or patch it? i.e. have I thought about that, but it ends up being rather complex. A function that does two different things depending on a boolean parameter is something of a code smell. I'll have another look. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From yadongwang at openjdk.org Tue Jul 12 11:23:50 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Tue, 12 Jul 2022 11:23:50 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 08:14:05 GMT, Fei Yang wrote: > Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. > > We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. > > Test: hotspot-tier1 & jdk-tier1 with QEMU. Marked as reviewed by yadongwang (Author). ------------- PR: https://git.openjdk.org/jdk/pull/9461 From duke at openjdk.org Tue Jul 12 11:49:08 2022 From: duke at openjdk.org (Ludvig Janiuk) Date: Tue, 12 Jul 2022 11:49:08 GMT Subject: RFR: JDK-8290020 Deadlock in leakprofiler::emit_events during shutdown Message-ID: <0h6r1AuS7NK-oWQpZsJ7hEel5jWZ3rpT_7rvF5S-SNQ=.d270906d-d503-4d2d-98d3-5f4e73f7258f@github.com> Add a boolean parameter to Jfr::on_vm_shutdown to differentiate the "called from java" case, and in that case to not call JfrEmergencyDump::on_vm_shutdown. ------------- Commit messages: - whitespace - whitespace - Simplify test - Fix test - Update test - Adding an option to detect call from java Changes: https://git.openjdk.org/jdk/pull/9465/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9465&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290020 Stats: 43 lines in 6 files changed: 9 ins; 10 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/9465.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9465/head:pull/9465 PR: https://git.openjdk.org/jdk/pull/9465 From duke at openjdk.org Tue Jul 12 11:49:08 2022 From: duke at openjdk.org (Ludvig Janiuk) Date: Tue, 12 Jul 2022 11:49:08 GMT Subject: RFR: JDK-8290020 Deadlock in leakprofiler::emit_events during shutdown In-Reply-To: <0h6r1AuS7NK-oWQpZsJ7hEel5jWZ3rpT_7rvF5S-SNQ=.d270906d-d503-4d2d-98d3-5f4e73f7258f@github.com> References: <0h6r1AuS7NK-oWQpZsJ7hEel5jWZ3rpT_7rvF5S-SNQ=.d270906d-d503-4d2d-98d3-5f4e73f7258f@github.com> Message-ID: On Tue, 12 Jul 2022 10:39:05 GMT, Ludvig Janiuk wrote: > Add a boolean parameter to Jfr::on_vm_shutdown to differentiate the "called from java" case, and in that case to not call JfrEmergencyDump::on_vm_shutdown. trying to trigger jcheck... ------------- PR: https://git.openjdk.org/jdk/pull/9465 From adinn at openjdk.org Tue Jul 12 11:57:46 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 12 Jul 2022 11:57:46 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> Message-ID: On Mon, 11 Jul 2022 16:33:50 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 311: > 309: } > 310: case 0b101010: // Compare & branch (immediate) > 311: case 0b011010: // Conditional branch (immediate) btw, the above two comments need swapping to match the correct bit pattern // ... 10___0 --> Conditional branch (immediate) case 0b101010: // ... 01___0 --> Compare & branch (immediate) case 0b011010: ------------- PR: https://git.openjdk.org/jdk/pull/9398 From iklam at openjdk.org Tue Jul 12 19:41:58 2022 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 12 Jul 2022 19:41:58 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 06:48:55 GMT, Thomas Stuefe wrote: > 1. Provides access control for members, which is unavailable with namespaces. > 2. Avoids [Argument Dependent Lookup][ADL] (ADL). > 3. Closed for additional members. Namespaces allow names to be added in multiple contexts, making it harder to see the complete API. I agree with Thomas that for the `os` interface, `#3` is actually the biggest disadvantage of using a class. `os` is a hodge-podge of porting interfaces. Usually no one cares what the "complete API" is (the only bad thing I can think of is that namespace allows declaration of new overloaded functions, which may make the code very hard to read; also see point `#2` below). However, I agree `#1` and `#2` are advantages of using class. For `#1`, even if we declare 'private' members in separate header files, they can be easily be leaked when we have inline functions like: /* public */ inline int os::getFoo() { return os::_private_foo; } Any file that includes `os_inline.hpp` will inadvertently see the private members. We can use naming convention to mark the members that shouldn't be accessed by outside code, but we can't use the C++ compiler to check for violations. For `#2` (https://en.wikipedia.org/wiki/Argument-dependent_name_lookup), we can partially address it by forbidding the use of `using namespace os`. Again, this is not enforceable by the C++ compiler. --- So without a clear consensus, I don't want to pursue the namespace solution. But I do want to move `os::Linux` outside of os.hpp. Can we still entertain my os_linux_impl.hpp proposal for now? Separately, we should split up the `os` class, like moving the string operations to a different header. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From kvn at openjdk.org Tue Jul 12 21:27:55 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 12 Jul 2022 21:27:55 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 21:03:30 GMT, Jatin Bhateja wrote: >> - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. >> - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8290066: Removing newly added white listed options. Testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9452 From jbhateja at openjdk.org Wed Jul 13 02:17:47 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Jul 2022 02:17:47 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 21:03:30 GMT, Jatin Bhateja wrote: >> - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. >> - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8290066: Removing newly added white listed options. Hi @kvn do we need second approval for this. Facing integration issue. ------------- PR: https://git.openjdk.org/jdk/pull/9452 From dholmes at openjdk.org Wed Jul 13 02:56:39 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 13 Jul 2022 02:56:39 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 19:33:13 GMT, Ioi Lam wrote: > Can we still entertain my os_linux_impl.hpp proposal for now? The namespace discussion was just a sidebar and we're not going to go with namespaces. So that just brings things back to what you proposed above. > My proposal is to remove all os- and platform- specific includes from os.hpp. The os class should include only functions that are usable from shared code. I don't like the original proposal and don't want to see it as an interim step until "something better" comes along. This is not a critical problem that must be solved today. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From iklam at openjdk.org Wed Jul 13 06:58:51 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 13 Jul 2022 06:58:51 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 23:27:31 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > renamed to os_linux_impl.hpp I've had some off-line discussion with David. He asked me an enlightening question -- > Why is os_linux.hpp almost empty after you proposal This is when I realized -- `os_linux.hpp` should contain nothing except for the class `os::Linux`! If I understood correctly, the design for the `os` class should be like this -- and most of the code already falls under this pattern: - The `os` class should contain a platform-independent API. - OS-specific implementation should be in either `os_.cpp` or `os_.inline.hpp` - For example, `uses_stack_guard_pages()` is declared in `os.hpp`, and implemented on `os_linux.inline.hpp`. - However, we a handful of deviations from this pattern. For example, `zero_page_read_protected()` is declared and implemented in `os_.hpp`, but it's fairly to move it to conform to the above pattern. The `os__` files should follow similar rules. For example, there's no reason to declare `setup_fpu()` everywhere (13 times) when it can be declared once in `os.hpp`. Over the years, I think the various os_xxx files were modified by authors who didn't understand the design, and we have lots of weird stuff - `atomic_copy64()` should not be part of the `os` class, but rather a static function in files like `os_linux_zero.cpp` - `workaround_expand_exec_shield_cs_limit()` should be a static function in `os_linux.cpp` So my revised proposal is: - Get rid of the "include a header file in the middle of the declaration of the os class" travesty - `os.hpp` declares a platform-independent API - `os_.hpp` declares utility functions that are used by the OS-specific implementation of the `os` class. This header file should be included by only the files under `src/hotspot/` or `src/hotspot/_` - Platform-specific implementations should usually be in the `os_.cpp` or `os__.cpp`. - When inlining is necessary for performance reasons, generic code should include `os.inline.hpp`, which in turn includes `os_.hpp`, which includes `os__.hpp`. This way, `os::Linux` can happily live inside `os_linux.hpp`, and won't be included transitively by `os.hpp` I have done a prototype here. Only Linux/x64 is working. I'll probably discover some problems when doing other os/cpus https://github.com/openjdk/jdk/compare/master...iklam:jdk:xxtemp-normalize-os-hpp-inclusion If people think this is worth pursuing, I will probably close this PR and open a new one, because the changes are much larger than I originally anticipated. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From kvn at openjdk.org Wed Jul 13 07:09:49 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 13 Jul 2022 07:09:49 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: On Wed, 13 Jul 2022 02:11:06 GMT, Jatin Bhateja wrote: > Hi @kvn do we need second approval for this. Facing integration issue. It is known issue which is investigated. Yes, you need second review. ------------- PR: https://git.openjdk.org/jdk/pull/9452 From stuefe at openjdk.org Wed Jul 13 07:19:48 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 13 Jul 2022 07:19:48 GMT Subject: RFR: 8265473: Move os::Linux to its own header file [v4] In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 23:27:31 GMT, Ioi Lam wrote: >> Another step of moving unnecessary stuff outside of os.hpp >> >> The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. >> >> I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > renamed to os_linux_impl.hpp > So my revised proposal is: > > * Get rid of the "include a header file in the middle of the declaration of the os class" travesty > * `os.hpp` declares a platform-independent API > * `os_.hpp` declares utility functions that are used by the OS-specific implementation of the `os` class. This header file should be included by only the files under `src/hotspot/` or `src/hotspot/_` > * Platform-specific implementations should usually be in the `os_.cpp` or `os__.cpp`. > * When inlining is necessary for performance reasons, generic code should include `os.inline.hpp`, which in turn includes `os_.hpp`, which includes `os__.hpp`. > > This way, `os::Linux` can happily live inside `os_linux.hpp`, and won't be included transitively by `os.hpp` > > I have done a prototype here. Only Linux/x64 is working. I'll probably discover some problems when doing other os/cpus > > [master...iklam:jdk:xxtemp-normalize-os-hpp-inclusion](https://github.com/openjdk/jdk/compare/master...iklam:jdk:xxtemp-normalize-os-hpp-inclusion) > > If people think this is worth pursuing, I will probably close this PR and open a new one, because the changes are much larger than I originally anticipated. I like everything about this proposal :-) os.hpp and friends have a long history, a lot of ports started by copy-pasting parts around, and its common property without a clear maintainer. That could explain its current state. One small addition, functions that are clearly both declared and used within os_xxx.cpp and nowhere else I would like to have removed from headers completely and converted to local statics. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/9423 From aph at openjdk.org Wed Jul 13 09:07:44 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 13 Jul 2022 09:07:44 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: <8J1pdGT6UmuSWcf1wk5qfqxEK_MyBdDhBaKzXGWFSJg=.37a25dbe-bde2-487f-be22-4fbcff32ee95@github.com> References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> <8J1pdGT6UmuSWcf1wk5qfqxEK_MyBdDhBaKzXGWFSJg=.37a25dbe-bde2-487f-be22-4fbcff32ee95@github.com> Message-ID: On Tue, 12 Jul 2022 12:10:45 GMT, Andrew Dinn wrote: > Well, normally I would agree. However, I'm balancing this against the problem of duplicatng some very complex decision logic. That duplication is a problem because it needs to stay the same in both occurrences if any of this ever gets changed and needs to be visibly the same to any passing programmer even when it does not get changed. OK, I'm on it. > Another possibility would be to have the dispatch logic simply return a unique enum value for each of the various distinguished leading instruction sub-cases and then have the retrieve and patch methods do a second switch on the returned enum value. That's an interesting idea, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From bulasevich at openjdk.org Wed Jul 13 12:59:15 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 13 Jul 2022 12:59:15 GMT Subject: RFR: 8288477: nmethod header size reduction [v2] In-Reply-To: References: Message-ID: > Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. > > Cleanup work: apply CompLevel type where applicable. > > The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime > > Renaissance benchmarks shows no performance regressions on x86 and aarch. > > BEFORE: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > /* 8 | 4 */ const CompilerType _type; <<<< > /* 12 | 4 */ int _size; > /* 16 | 4 */ int _header_size; > /* 20 | 4 */ int _frame_complete_offset; > /* 24 | 4 */ int _data_offset; > /* 28 | 4 */ int _frame_size; > /* 32 | 8 */ address _code_begin; > /* 40 | 8 */ address _code_end; > /* 48 | 8 */ address _content_begin; > /* 56 | 8 */ address _data_end; > /* 64 | 8 */ address _relocation_begin; > /* 72 | 8 */ address _relocation_end; > /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 88 | 1 */ bool _caller_must_gc_arguments; > /* 89 | 1 */ bool _is_compiled; > /* XXX 6-byte hole */ > /* 96 | 8 */ const char *_name; > /* 104 | 8 */ class AsmRemarks { > /* 104 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 112 | 8 */ class DbgStrings { > /* 112 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 120 */ > } > > AFTER: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > protected: > /* 8 | 8 */ address _code_begin; > /* 16 | 8 */ address _code_end; > /* 24 | 8 */ address _content_begin; > /* 32 | 8 */ address _data_end; > /* 40 | 8 */ address _relocation_begin; > /* 48 | 8 */ address _relocation_end; > /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 64 | 8 */ const char *_name; > /* 72 | 4 */ int _size; > /* 76 | 4 */ int _header_size; > /* 80 | 4 */ int _frame_complete_offset; > /* 84 | 4 */ int _data_offset; > /* 88 | 4 */ int _frame_size; > /* 92 | 1 */ bool _caller_must_gc_arguments; > /* 93 | 1 */ bool _is_compiled; > /* 94 | 1 */ const CompilerType _type; <<<< > /* XXX 1-byte hole */ > /* 96 | 8 */ class AsmRemarks { > /* 96 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 104 | 8 */ class DbgStrings { > /* 104 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 112 */ > } > > BEFORE: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > private: > /* 208 | 4 */ int _entry_bci; > /* XXX 4-byte hole */ > /* 216 | 8 */ uint64_t _gc_epoch; > /* 224 | 8 */ nmethod *_osr_link; > /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 240 | 8 */ address _entry_point; > /* 248 | 8 */ address _verified_entry_point; > /* 256 | 8 */ address _osr_entry_point; > /* 264 | 4 */ int _exception_offset; > /* 268 | 4 */ int _unwind_handler_offset; > /* 272 | 4 */ int _consts_offset; > /* 276 | 4 */ int _stub_offset; > /* 280 | 4 */ int _oops_offset; > /* 284 | 4 */ int _metadata_offset; > /* 288 | 4 */ int _scopes_data_offset; > /* 292 | 4 */ int _scopes_pcs_offset; > /* 296 | 4 */ int _dependencies_offset; > /* 300 | 4 */ int _handler_table_offset; > /* 304 | 4 */ int _nul_chk_table_offset; > /* 308 | 4 */ int _speculations_offset; > /* 312 | 4 */ int _jvmci_data_offset; > /* 316 | 4 */ int _nmethod_end_offset; > /* 320 | 4 */ int _orig_pc_offset; > /* 324 | 4 */ int _compile_id; > /* 328 | 4 */ int _comp_level; <<<< > /* 332 | 1 */ bool _has_flushed_dependencies; > /* 333 | 1 */ bool _unload_reported; > /* 334 | 1 */ bool _load_reported; > /* 335 | 1 */ volatile signed char _state; > /* 336 | 1 */ bool _oops_are_stale; > /* XXX 3-byte hole */ > /* 340 | 4 */ RTMState _rtm_state; > /* 344 | 4 */ volatile jint _lock_count; > /* XXX 4-byte hole */ > /* 352 | 8 */ volatile int64_t _stack_traversal_mark; > /* 360 | 4 */ int _hotness_counter; > /* 364 | 1 */ volatile uint8_t _is_unloading_state; > /* XXX 3-byte hole */ > /* 368 | 4 */ ByteSize _native_receiver_sp_offset; > /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; > > /* total size (bytes): 376 */ > } > > AFTER: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; > /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 232 | 8 */ address _entry_point; > /* 240 | 8 */ address _verified_entry_point; > /* 248 | 8 */ address _osr_entry_point; > /* 256 | 4 */ int _entry_bci; > /* 260 | 4 */ int _exception_offset; > /* 264 | 4 */ int _unwind_handler_offset; > /* 268 | 4 */ int _consts_offset; > /* 272 | 4 */ int _stub_offset; > /* 276 | 4 */ int _oops_offset; > /* 280 | 4 */ int _metadata_offset; > /* 284 | 4 */ int _scopes_data_offset; > /* 288 | 4 */ int _scopes_pcs_offset; > /* 292 | 4 */ int _dependencies_offset; > /* 296 | 4 */ int _handler_table_offset; > /* 300 | 4 */ int _nul_chk_table_offset; > /* 304 | 4 */ int _speculations_offset; > /* 308 | 4 */ int _jvmci_data_offset; > /* 312 | 4 */ int _nmethod_end_offset; > /* 316 | 4 */ int _orig_pc_offset; > /* 320 | 4 */ int _compile_id; > /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; > /* 332 | 4 */ int _hotness_counter; > /* 336 | 4 */ ByteSize _native_receiver_sp_offset; > /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; > /* 344 | 1 */ CompLevel _comp_level; <<<< > /* 345 | 1 */ volatile uint8_t _is_unloading_state; > /* 346 | 1 */ bool _has_flushed_dependencies; > /* 347 | 1 */ bool _unload_reported; > /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; > /* 350 | 1 */ bool _oops_are_stale; > > /* total size (bytes): 352 */ > } Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: Undo applying CompLevel where applicable. It must be a separate change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9165/files - new: https://git.openjdk.org/jdk/pull/9165/files/644be419..485b7250 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9165&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9165&range=00-01 Stats: 72 lines in 16 files changed: 0 ins; 0 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/9165.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9165/head:pull/9165 PR: https://git.openjdk.org/jdk/pull/9165 From jbhateja at openjdk.org Wed Jul 13 16:49:06 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Jul 2022 16:49:06 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 21:03:30 GMT, Jatin Bhateja wrote: >> - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. >> - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8290066: Removing newly added white listed options. @chhagedorn , can you kindly review and approve. ------------- PR: https://git.openjdk.org/jdk/pull/9452 From itakiguchi at openjdk.org Wed Jul 13 17:02:30 2022 From: itakiguchi at openjdk.org (Ichiroh Takiguchi) Date: Wed, 13 Jul 2022 17:02:30 GMT Subject: RFR: 8290218: AIX build failure by JDK-8289780 Message-ID: AIX build was failed by following messages: * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_link: ld: 0711-317 ERROR: Undefined symbol: collector_func_load ld: 0711-317 ERROR: Undefined symbol: .collector_func_load ld: 0711-344 See the loadmap file /home/jdkbld/git/jdk/build/aix-ppc64-server-release/hotspot/variant-server/libjvm/objs/libjvm.loadmap for more information. In my investigation, [JDK-8289780](https://bugs.openjdk.org/browse/JDK-8289780) affects this issue on src/hotspot/share/prims/forte.cpp. ------------- Commit messages: - 8290218: AIX build failure by JDK-8289780 Changes: https://git.openjdk.org/jdk/pull/9482/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9482&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290218 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9482.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9482/head:pull/9482 PR: https://git.openjdk.org/jdk/pull/9482 From iklam at openjdk.org Wed Jul 13 19:41:07 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 13 Jul 2022 19:41:07 GMT Subject: RFR: 8290218: AIX build failure by JDK-8289780 In-Reply-To: References: Message-ID: On Wed, 13 Jul 2022 16:55:38 GMT, Ichiroh Takiguchi wrote: > AIX build was failed by following messages: > > * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_link: > ld: 0711-317 ERROR: Undefined symbol: collector_func_load > ld: 0711-317 ERROR: Undefined symbol: .collector_func_load > ld: 0711-344 See the loadmap file /home/jdkbld/git/jdk/build/aix-ppc64-server-release/hotspot/variant-server/libjvm/objs/libjvm.loadmap for more information. > > In my investigation, [JDK-8289780](https://bugs.openjdk.org/browse/JDK-8289780) affects this issue on src/hotspot/share/prims/forte.cpp. Changes requested by iklam (Reviewer). src/hotspot/share/prims/forte.cpp line 670: > 668: // Because it is weakly bound, the calls become NOP's when the library > 669: // isn't present. > 670: #if defined(__APPLE__) I think this can be combined with the condition below: #if defined(__APPLE__) || defined(_AIX) I am curious why `collector_func_load` worked before but `collector_func_load_enabled()` doesn't work, since they are using the same pattern. Anyway, I think the new change is better, since the profiling tool that requires `collector_func_load` isn't available on AIX. ------------- PR: https://git.openjdk.org/jdk/pull/9482 From bulasevich at openjdk.org Wed Jul 13 20:39:55 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 13 Jul 2022 20:39:55 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: > Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. > > Cleanup work: apply CompLevel type where applicable. > > The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime > > Renaissance benchmarks shows no performance regressions on x86 and aarch. > > BEFORE: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > /* 8 | 4 */ const CompilerType _type; <<<< > /* 12 | 4 */ int _size; > /* 16 | 4 */ int _header_size; > /* 20 | 4 */ int _frame_complete_offset; > /* 24 | 4 */ int _data_offset; > /* 28 | 4 */ int _frame_size; > /* 32 | 8 */ address _code_begin; > /* 40 | 8 */ address _code_end; > /* 48 | 8 */ address _content_begin; > /* 56 | 8 */ address _data_end; > /* 64 | 8 */ address _relocation_begin; > /* 72 | 8 */ address _relocation_end; > /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 88 | 1 */ bool _caller_must_gc_arguments; > /* 89 | 1 */ bool _is_compiled; > /* XXX 6-byte hole */ > /* 96 | 8 */ const char *_name; > /* 104 | 8 */ class AsmRemarks { > /* 104 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 112 | 8 */ class DbgStrings { > /* 112 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 120 */ > } > > AFTER: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > protected: > /* 8 | 8 */ address _code_begin; > /* 16 | 8 */ address _code_end; > /* 24 | 8 */ address _content_begin; > /* 32 | 8 */ address _data_end; > /* 40 | 8 */ address _relocation_begin; > /* 48 | 8 */ address _relocation_end; > /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 64 | 8 */ const char *_name; > /* 72 | 4 */ int _size; > /* 76 | 4 */ int _header_size; > /* 80 | 4 */ int _frame_complete_offset; > /* 84 | 4 */ int _data_offset; > /* 88 | 4 */ int _frame_size; > /* 92 | 1 */ bool _caller_must_gc_arguments; > /* 93 | 1 */ bool _is_compiled; > /* 94 | 1 */ const CompilerType _type; <<<< > /* XXX 1-byte hole */ > /* 96 | 8 */ class AsmRemarks { > /* 96 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 104 | 8 */ class DbgStrings { > /* 104 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 112 */ > } > > BEFORE: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > private: > /* 208 | 4 */ int _entry_bci; > /* XXX 4-byte hole */ > /* 216 | 8 */ uint64_t _gc_epoch; > /* 224 | 8 */ nmethod *_osr_link; > /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 240 | 8 */ address _entry_point; > /* 248 | 8 */ address _verified_entry_point; > /* 256 | 8 */ address _osr_entry_point; > /* 264 | 4 */ int _exception_offset; > /* 268 | 4 */ int _unwind_handler_offset; > /* 272 | 4 */ int _consts_offset; > /* 276 | 4 */ int _stub_offset; > /* 280 | 4 */ int _oops_offset; > /* 284 | 4 */ int _metadata_offset; > /* 288 | 4 */ int _scopes_data_offset; > /* 292 | 4 */ int _scopes_pcs_offset; > /* 296 | 4 */ int _dependencies_offset; > /* 300 | 4 */ int _handler_table_offset; > /* 304 | 4 */ int _nul_chk_table_offset; > /* 308 | 4 */ int _speculations_offset; > /* 312 | 4 */ int _jvmci_data_offset; > /* 316 | 4 */ int _nmethod_end_offset; > /* 320 | 4 */ int _orig_pc_offset; > /* 324 | 4 */ int _compile_id; > /* 328 | 4 */ int _comp_level; <<<< > /* 332 | 1 */ bool _has_flushed_dependencies; > /* 333 | 1 */ bool _unload_reported; > /* 334 | 1 */ bool _load_reported; > /* 335 | 1 */ volatile signed char _state; > /* 336 | 1 */ bool _oops_are_stale; > /* XXX 3-byte hole */ > /* 340 | 4 */ RTMState _rtm_state; > /* 344 | 4 */ volatile jint _lock_count; > /* XXX 4-byte hole */ > /* 352 | 8 */ volatile int64_t _stack_traversal_mark; > /* 360 | 4 */ int _hotness_counter; > /* 364 | 1 */ volatile uint8_t _is_unloading_state; > /* XXX 3-byte hole */ > /* 368 | 4 */ ByteSize _native_receiver_sp_offset; > /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; > > /* total size (bytes): 376 */ > } > > AFTER: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; > /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 232 | 8 */ address _entry_point; > /* 240 | 8 */ address _verified_entry_point; > /* 248 | 8 */ address _osr_entry_point; > /* 256 | 4 */ int _entry_bci; > /* 260 | 4 */ int _exception_offset; > /* 264 | 4 */ int _unwind_handler_offset; > /* 268 | 4 */ int _consts_offset; > /* 272 | 4 */ int _stub_offset; > /* 276 | 4 */ int _oops_offset; > /* 280 | 4 */ int _metadata_offset; > /* 284 | 4 */ int _scopes_data_offset; > /* 288 | 4 */ int _scopes_pcs_offset; > /* 292 | 4 */ int _dependencies_offset; > /* 296 | 4 */ int _handler_table_offset; > /* 300 | 4 */ int _nul_chk_table_offset; > /* 304 | 4 */ int _speculations_offset; > /* 308 | 4 */ int _jvmci_data_offset; > /* 312 | 4 */ int _nmethod_end_offset; > /* 316 | 4 */ int _orig_pc_offset; > /* 320 | 4 */ int _compile_id; > /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; > /* 332 | 4 */ int _hotness_counter; > /* 336 | 4 */ ByteSize _native_receiver_sp_offset; > /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; > /* 344 | 1 */ CompLevel _comp_level; <<<< > /* 345 | 1 */ volatile uint8_t _is_unloading_state; > /* 346 | 1 */ bool _has_flushed_dependencies; > /* 347 | 1 */ bool _unload_reported; > /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; > /* 350 | 1 */ bool _oops_are_stale; > > /* total size (bytes): 352 */ > } Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Undo applying CompLevel where applicable. It must be a separate change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9165/files - new: https://git.openjdk.org/jdk/pull/9165/files/485b7250..2d2a07af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9165&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9165&range=01-02 Stats: 8 lines in 5 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9165.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9165/head:pull/9165 PR: https://git.openjdk.org/jdk/pull/9165 From itakiguchi at openjdk.org Thu Jul 14 00:42:55 2022 From: itakiguchi at openjdk.org (Ichiroh Takiguchi) Date: Thu, 14 Jul 2022 00:42:55 GMT Subject: RFR: 8290218: AIX build failure by JDK-8289780 [v2] In-Reply-To: References: Message-ID: > AIX build was failed by following messages: > > * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_link: > ld: 0711-317 ERROR: Undefined symbol: collector_func_load > ld: 0711-317 ERROR: Undefined symbol: .collector_func_load > ld: 0711-344 See the loadmap file /home/jdkbld/git/jdk/build/aix-ppc64-server-release/hotspot/variant-server/libjvm/objs/libjvm.loadmap for more information. > > In my investigation, [JDK-8289780](https://bugs.openjdk.org/browse/JDK-8289780) affects this issue on src/hotspot/share/prims/forte.cpp. Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision: 8290218: AIX build failure by JDK-8289780 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9482/files - new: https://git.openjdk.org/jdk/pull/9482/files/c3079ad3..6bd4b154 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9482&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9482&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9482.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9482/head:pull/9482 PR: https://git.openjdk.org/jdk/pull/9482 From fyang at openjdk.org Thu Jul 14 01:55:01 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 14 Jul 2022 01:55:01 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 08:14:05 GMT, Fei Yang wrote: > Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. > > We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. > > Testing: tier1 tested on riscv64-linux unmatched board. Could we have a Reviewer please? Maybe @shipilev ? ------------- PR: https://git.openjdk.org/jdk/pull/9461 From iklam at openjdk.org Thu Jul 14 02:54:59 2022 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 14 Jul 2022 02:54:59 GMT Subject: RFR: 8290218: AIX build failure by JDK-8289780 [v2] In-Reply-To: References: Message-ID: <8URXlYY2my6ky8cXiJ8dKOFB9fMH0EF-b2AOIJBedog=.3167e9ed-b328-4b78-8fc4-3e6a1190bc0c@github.com> On Thu, 14 Jul 2022 00:42:55 GMT, Ichiroh Takiguchi wrote: >> AIX build was failed by following messages: >> >> * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_link: >> ld: 0711-317 ERROR: Undefined symbol: collector_func_load >> ld: 0711-317 ERROR: Undefined symbol: .collector_func_load >> ld: 0711-344 See the loadmap file /home/jdkbld/git/jdk/build/aix-ppc64-server-release/hotspot/variant-server/libjvm/objs/libjvm.loadmap for more information. >> >> In my investigation, [JDK-8289780](https://bugs.openjdk.org/browse/JDK-8289780) affects this issue on src/hotspot/share/prims/forte.cpp. > > Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision: > > 8290218: AIX build failure by JDK-8289780 LGTM and this can be considered as a trivial change that requires only one review. ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.org/jdk/pull/9482 From itakiguchi at openjdk.org Thu Jul 14 02:55:01 2022 From: itakiguchi at openjdk.org (Ichiroh Takiguchi) Date: Thu, 14 Jul 2022 02:55:01 GMT Subject: RFR: 8290218: AIX build failure by JDK-8289780 [v2] In-Reply-To: References: Message-ID: On Wed, 13 Jul 2022 19:37:50 GMT, Ioi Lam wrote: >> Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision: >> >> 8290218: AIX build failure by JDK-8289780 > > Changes requested by iklam (Reviewer). Hello @iklam . I appreciate your comment. forte.cpp was updated. Please review the file again. ------------- PR: https://git.openjdk.org/jdk/pull/9482 From stuefe at openjdk.org Thu Jul 14 03:39:58 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 14 Jul 2022 03:39:58 GMT Subject: RFR: 8290218: AIX build failure by JDK-8289780 [v2] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 00:42:55 GMT, Ichiroh Takiguchi wrote: >> AIX build was failed by following messages: >> >> * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_link: >> ld: 0711-317 ERROR: Undefined symbol: collector_func_load >> ld: 0711-317 ERROR: Undefined symbol: .collector_func_load >> ld: 0711-344 See the loadmap file /home/jdkbld/git/jdk/build/aix-ppc64-server-release/hotspot/variant-server/libjvm/objs/libjvm.loadmap for more information. >> >> In my investigation, [JDK-8289780](https://bugs.openjdk.org/browse/JDK-8289780) affects this issue on src/hotspot/share/prims/forte.cpp. > > Ichiroh Takiguchi has updated the pull request incrementally with one additional commit since the last revision: > > 8290218: AIX build failure by JDK-8289780 LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/9482 From itakiguchi at openjdk.org Thu Jul 14 04:39:57 2022 From: itakiguchi at openjdk.org (Ichiroh Takiguchi) Date: Thu, 14 Jul 2022 04:39:57 GMT Subject: Integrated: 8290218: AIX build failure by JDK-8289780 In-Reply-To: References: Message-ID: On Wed, 13 Jul 2022 16:55:38 GMT, Ichiroh Takiguchi wrote: > AIX build was failed by following messages: > > * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_link: > ld: 0711-317 ERROR: Undefined symbol: collector_func_load > ld: 0711-317 ERROR: Undefined symbol: .collector_func_load > ld: 0711-344 See the loadmap file /home/jdkbld/git/jdk/build/aix-ppc64-server-release/hotspot/variant-server/libjvm/objs/libjvm.loadmap for more information. > > In my investigation, [JDK-8289780](https://bugs.openjdk.org/browse/JDK-8289780) affects this issue on src/hotspot/share/prims/forte.cpp. This pull request has now been integrated. Changeset: 5d588eda Author: Ichiroh Takiguchi URL: https://git.openjdk.org/jdk/commit/5d588eda97aeab0c8fda6ad8d332d6a4cae31b05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8290218: AIX build failure by JDK-8289780 Reviewed-by: iklam, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/9482 From dholmes at openjdk.org Thu Jul 14 05:57:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Jul 2022 05:57:04 GMT Subject: RFR: JDK-8290020 Deadlock in leakprofiler::emit_events during shutdown [v2] In-Reply-To: References: <0h6r1AuS7NK-oWQpZsJ7hEel5jWZ3rpT_7rvF5S-SNQ=.d270906d-d503-4d2d-98d3-5f4e73f7258f@github.com> <8_FCR5Z1MjBhkKGDIKjweQT9Sc2T7DV9zBRT83SuzPY=.9938b423-29e9-4283-ac0d-8a0f044127a5@github.com> Message-ID: <-BLxabxeUywPpKC8fhQfrZXqbHFlimlSdiWLOBO0-oU=.58ad449b-b671-4db7-82ee-03ce4800ed81@github.com> On Tue, 12 Jul 2022 13:23:38 GMT, Markus Gr?nlund wrote: >> Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: >> >> rm CrasherHald test > > Marked as reviewed by mgronlun (Reviewer). @mgronlun or @egahlin do you have any feedback on my query above? ------------- PR: https://git.openjdk.org/jdk/pull/9465 From fjiang at openjdk.org Thu Jul 14 07:50:47 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 14 Jul 2022 07:50:47 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 08:14:05 GMT, Fei Yang wrote: > Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. > > We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. > > Testing: tier1 tested on riscv64-linux unmatched board. Marked as reviewed by fjiang (Author). ------------- PR: https://git.openjdk.org/jdk/pull/9461 From fjiang at openjdk.org Thu Jul 14 07:53:44 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 14 Jul 2022 07:53:44 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter Message-ID: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> As [JDK-8288971](https://bugs.openjdk.org/browse/JDK-8288971) described, we have the same issue on riscv backend: 1. We use x30 to pass the caller's SP to a callee through adapters. x30 is not a callee-saved register in native ABI [1], we choose x19 for this patch. 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. 3. Relate to 1, we should clearly label all the places where the caller's SP is passed to a callee. [1]. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc Additional tests: - hotspot/jdk tier1 on QEMU with Release JDK - hotspot tier1 on HiFive Unmatched board with Release JDK - hotspot tier1 on QEMU with Fastdebug JDK ------------- Commit messages: - riscv: Clean up stack and register handling in interpreter Changes: https://git.openjdk.org/jdk/pull/9487/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9487&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290280 Stats: 145 lines in 11 files changed: 69 ins; 34 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/9487.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9487/head:pull/9487 PR: https://git.openjdk.org/jdk/pull/9487 From aph at openjdk.org Thu Jul 14 08:35:12 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 14 Jul 2022 08:35:12 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter In-Reply-To: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: On Thu, 14 Jul 2022 07:42:57 GMT, Feilong Jiang wrote: > As [JDK-8288971](https://bugs.openjdk.org/browse/JDK-8288971) described, we have the same issue on riscv backend: > > 1. We use x30 to pass the caller's SP to a callee through adapters. x30 is not a callee-saved register in native ABI [1], we choose x19 for this patch. > 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. > 3. Relate to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > [1]. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > > Additional tests: > - hotspot/jdk tier1 on QEMU with Release JDK > - hotspot tier1 on HiFive Unmatched board with Release JDK > - hotspot tier1 on QEMU with Fastdebug JDK This looks reasonable. Don't forget that JDK-8289698 will be needed when you have support for virtual threads. ------------- PR: https://git.openjdk.org/jdk/pull/9487 From kbarrett at openjdk.org Thu Jul 14 08:41:36 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 14 Jul 2022 08:41:36 GMT Subject: RFR: 8290290: Remove addition of TimeInstants Message-ID: Please review this trivial change to remove the unused and semantically suspect function TimeInstant::operator+(TimeInstant). ------------- Commit messages: - remove addition of TimeInstants Changes: https://git.openjdk.org/jdk/pull/9489/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9489&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290290 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9489.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9489/head:pull/9489 PR: https://git.openjdk.org/jdk/pull/9489 From fjiang at openjdk.org Thu Jul 14 08:58:00 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 14 Jul 2022 08:58:00 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter In-Reply-To: References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: On Thu, 14 Jul 2022 08:31:16 GMT, Andrew Haley wrote: > This looks reasonable. Don't forget that JDK-8289698 will be needed when you have support for virtual threads. @theRealAph Thanks for your kind reminder, we will take care of it. ------------- PR: https://git.openjdk.org/jdk/pull/9487 From egahlin at openjdk.org Thu Jul 14 09:18:05 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 14 Jul 2022 09:18:05 GMT Subject: RFR: JDK-8290020 Deadlock in leakprofiler::emit_events during shutdown [v2] In-Reply-To: References: <0h6r1AuS7NK-oWQpZsJ7hEel5jWZ3rpT_7rvF5S-SNQ=.d270906d-d503-4d2d-98d3-5f4e73f7258f@github.com> <8_FCR5Z1MjBhkKGDIKjweQT9Sc2T7DV9zBRT83SuzPY=.9938b423-29e9-4283-ac0d-8a0f044127a5@github.com> Message-ID: On Tue, 12 Jul 2022 13:23:38 GMT, Markus Gr?nlund wrote: >> Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: >> >> rm CrasherHald test > > Marked as reviewed by mgronlun (Reviewer). > @mgronlun or @egahlin do you have any feedback on my query above? The emergency dump was added because we could not execute Java code during JVM crash and the code to extract a recording file from a core dump was hard to maintain. The functionality was never meant to be triggered from Java. One could argue the emergency dump should happen if somebody calls Runtime::halt(int), but I don't think that's what users want. Most likely they want a fast exit. If they call System.exit(int), JFR will do an ordinary dump in the shutdown hook, assuming JFR has been setup to dump on exit. ------------- PR: https://git.openjdk.org/jdk/pull/9465 From mdoerr at openjdk.org Thu Jul 14 09:33:58 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 14 Jul 2022 09:33:58 GMT Subject: RFR: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 07:51:01 GMT, Fei Gao wrote: > Superword doesn't vectorize any nodes of non-primitive types and > thus sets `allow_address` false when calling type2aelembytes() in > SuperWord::data_size()[1]. Therefore, when we try to resolve the > data size for a node of T_ADDRESS type, the assertion in > type2aelembytes()[2] takes effect. > > We try to resolve the data sizes for node s and node t in the > SuperWord::adjust_alignment_for_type_conversion()[3] when type > conversion between different data sizes happens. The issue is, > when node s is a ConvI2L node and node t is an AddP node of > T_ADDRESS type, type2aelembytes() will assert. To fix it, we > should filter out all non-primitive nodes, like the patch does > in SuperWord::adjust_alignment_for_type_conversion(). Since > it's a failure in the mid-end, all superword available platforms > are affected. In my local test, this failure can be reproduced > on both x86 and aarch64. With this patch, the failure can be fixed. > > Apart from fixing the bug, the patch also adds necessary type check > and does some clean-up in SuperWord::longer_type_for_conversion() > and VectorCastNode::implemented(). > > [1]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1417 > [2]https://github.com/openjdk/jdk/blob/b96ba19807845739b36274efb168dd048db819a3/src/hotspot/share/utilities/globalDefinitions.cpp#L326 > [3]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1454 Your fix LGTM. The test doesn't show the problem on PPC64, but my original replay file has worked to verify the fix. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/9391 From stuefe at openjdk.org Thu Jul 14 10:17:02 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 14 Jul 2022 10:17:02 GMT Subject: RFR: 8290290: Remove addition of TimeInstants In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 08:34:27 GMT, Kim Barrett wrote: > Please review this trivial change to remove the unused and semantically suspect > function TimeInstant::operator+(TimeInstant). Looks good and trivial. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/9489 From mgronlun at openjdk.org Thu Jul 14 10:20:03 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 14 Jul 2022 10:20:03 GMT Subject: RFR: JDK-8290020 Deadlock in leakprofiler::emit_events during shutdown [v2] In-Reply-To: <8_FCR5Z1MjBhkKGDIKjweQT9Sc2T7DV9zBRT83SuzPY=.9938b423-29e9-4283-ac0d-8a0f044127a5@github.com> References: <0h6r1AuS7NK-oWQpZsJ7hEel5jWZ3rpT_7rvF5S-SNQ=.d270906d-d503-4d2d-98d3-5f4e73f7258f@github.com> <8_FCR5Z1MjBhkKGDIKjweQT9Sc2T7DV9zBRT83SuzPY=.9938b423-29e9-4283-ac0d-8a0f044127a5@github.com> Message-ID: On Tue, 12 Jul 2022 13:22:35 GMT, Ludvig Janiuk wrote: >> Add a boolean parameter to Jfr::on_vm_shutdown to differentiate the "called from java" case, and in that case to not call JfrEmergencyDump::on_vm_shutdown. > > Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: > > rm CrasherHald test Runtime.halt() is tricky in that this method does not cause shutdown hooks to be started. If the shutdown sequence has already been initiated then this method does not wait for any running shutdown hooks to finish their work. This means it could deadlock with the normal Leakprofiler dump logic, issued either by the thread that stops a recording or if the shutdown hooks have already started, with the shutdown hook. It was an oversight to not exclude Runtime.halt() from calling into the emergency dump logic. One could argue that it could be distinguished in that it does not pass the exception handler, so we are not crashing. Unfortunately, that is only one part of the problem because we also attempt to emit the emergency dump in cases of out-of-memory, which is very hard to coordinate correctly. ------------- PR: https://git.openjdk.org/jdk/pull/9465 From aph at openjdk.org Thu Jul 14 12:49:27 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 14 Jul 2022 12:49:27 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v4] In-Reply-To: References: Message-ID: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/92016cb1..ed2a1bb6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=02-03 Stats: 812 lines in 1 file changed: 547 ins; 187 del; 78 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From coleenp at openjdk.org Thu Jul 14 12:58:29 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Jul 2022 12:58:29 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception Message-ID: I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. ------------- Commit messages: - 8272096: Exceptions::new_exception can return wrong exception Changes: https://git.openjdk.org/jdk/pull/9492/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9492&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8272096 Stats: 29 lines in 4 files changed: 7 ins; 5 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/9492.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9492/head:pull/9492 PR: https://git.openjdk.org/jdk/pull/9492 From dholmes at openjdk.org Thu Jul 14 13:24:00 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Jul 2022 13:24:00 GMT Subject: RFR: 8290290: Remove addition of TimeInstants In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 08:34:27 GMT, Kim Barrett wrote: > Please review this trivial change to remove the unused and semantically suspect > function TimeInstant::operator+(TimeInstant). Totally agree: adding time instants is meaningless. Looks good and trivial (as Thomas already said). Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9489 From dholmes at openjdk.org Thu Jul 14 13:28:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Jul 2022 13:28:04 GMT Subject: RFR: JDK-8290020 Deadlock in leakprofiler::emit_events during shutdown [v2] In-Reply-To: <8_FCR5Z1MjBhkKGDIKjweQT9Sc2T7DV9zBRT83SuzPY=.9938b423-29e9-4283-ac0d-8a0f044127a5@github.com> References: <0h6r1AuS7NK-oWQpZsJ7hEel5jWZ3rpT_7rvF5S-SNQ=.d270906d-d503-4d2d-98d3-5f4e73f7258f@github.com> <8_FCR5Z1MjBhkKGDIKjweQT9Sc2T7DV9zBRT83SuzPY=.9938b423-29e9-4283-ac0d-8a0f044127a5@github.com> Message-ID: On Tue, 12 Jul 2022 13:22:35 GMT, Ludvig Janiuk wrote: >> Add a boolean parameter to Jfr::on_vm_shutdown to differentiate the "called from java" case, and in that case to not call JfrEmergencyDump::on_vm_shutdown. > > Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: > > rm CrasherHald test Okay so the issue is really that the emergency dump should never run concurrently with the regular dump. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9465 From mbaesken at openjdk.org Thu Jul 14 14:26:01 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 14 Jul 2022 14:26:01 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v3] In-Reply-To: References: Message-ID: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Bring back JitRestart event, add codeCacheMaxCapacity ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9334/files - new: https://git.openjdk.org/jdk/pull/9334/files/29665ecb..9e4a54b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=01-02 Stats: 29 lines in 6 files changed: 19 ins; 7 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9334.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9334/head:pull/9334 PR: https://git.openjdk.org/jdk/pull/9334 From kbarrett at openjdk.org Thu Jul 14 14:33:15 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 14 Jul 2022 14:33:15 GMT Subject: RFR: 8290290: Remove addition of TimeInstants [v2] In-Reply-To: References: Message-ID: > Please review this trivial change to remove the unused and semantically suspect > function TimeInstant::operator+(TimeInstant). Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into no-add-ticks - remove addition of TimeInstants ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9489/files - new: https://git.openjdk.org/jdk/pull/9489/files/21b10028..f083e572 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9489&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9489&range=00-01 Stats: 28 lines in 2 files changed: 6 ins; 18 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9489.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9489/head:pull/9489 PR: https://git.openjdk.org/jdk/pull/9489 From kbarrett at openjdk.org Thu Jul 14 14:33:16 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 14 Jul 2022 14:33:16 GMT Subject: RFR: 8290290: Remove addition of TimeInstants [v2] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 10:13:23 GMT, Thomas Stuefe wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into no-add-ticks >> - remove addition of TimeInstants > > Looks good and trivial. Thanks for reviews @tstuefe and @dholmes-ora . ------------- PR: https://git.openjdk.org/jdk/pull/9489 From kbarrett at openjdk.org Thu Jul 14 14:40:59 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 14 Jul 2022 14:40:59 GMT Subject: Integrated: 8290290: Remove addition of TimeInstants In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 08:34:27 GMT, Kim Barrett wrote: > Please review this trivial change to remove the unused and semantically suspect > function TimeInstant::operator+(TimeInstant). This pull request has now been integrated. Changeset: 3bb2dc8e Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/3bb2dc8e7f91061a9f1141e9b8122d00adb9faee Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod 8290290: Remove addition of TimeInstants Reviewed-by: stuefe, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/9489 From adinn at openjdk.org Thu Jul 14 14:42:08 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 14 Jul 2022 14:42:08 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> <8J1pdGT6UmuSWcf1wk5qfqxEK_MyBdDhBaKzXGWFSJg=.37a25dbe-bde2-487f-be22-4fbcff32ee95@github.com> Message-ID: On Wed, 13 Jul 2022 08:54:10 GMT, Andrew Haley wrote: > Another possibility would be to have the dispatch logic simply return a unique enum value for each of the various distinguished leading instruction sub-cases and then have the retrieve and patch methods do a second switch on the returned enum value. ... of course, rather than rely on a switch I guess a pucker OO solution would use a virtual dispatch ... :-) ------------- PR: https://git.openjdk.org/jdk/pull/9398 From coleenp at openjdk.org Thu Jul 14 15:06:49 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Jul 2022 15:06:49 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v2] In-Reply-To: References: Message-ID: > I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: second get_user_name_slow call should CHECK_NULL too. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9492/files - new: https://git.openjdk.org/jdk/pull/9492/files/d5ab48a4..daf55a4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9492&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9492&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9492.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9492/head:pull/9492 PR: https://git.openjdk.org/jdk/pull/9492 From aph at openjdk.org Thu Jul 14 15:36:40 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 14 Jul 2022 15:36:40 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v2] In-Reply-To: References: <0XC1ajNWGeuu5LIvecXbAYK-arpgWJ8Rg69PyN6o-08=.cf3828bb-5a1f-4777-b0ed-bfdf89469ade@github.com> <8J1pdGT6UmuSWcf1wk5qfqxEK_MyBdDhBaKzXGWFSJg=.37a25dbe-bde2-487f-be22-4fbcff32ee95@github.com> Message-ID: On Thu, 14 Jul 2022 14:38:35 GMT, Andrew Dinn wrote: > ... of course, rather than rely on a switch I guess a pucker OO solution would use a virtual dispatch ... :-) Funnily enough ... you'll see soon. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Thu Jul 14 16:21:05 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 14 Jul 2022 16:21:05 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v5] In-Reply-To: References: Message-ID: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/ed2a1bb6..bc64b79f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=03-04 Stats: 364 lines in 4 files changed: 40 ins; 307 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From lucy at openjdk.org Thu Jul 14 16:51:03 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 14 Jul 2022 16:51:03 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v3] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 14:26:01 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Bring back JitRestart event, add codeCacheMaxCapacity Changes look good. Now you can relate the freed memory to what's available in total. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/9334 From mgronlun at openjdk.org Thu Jul 14 17:09:08 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 14 Jul 2022 17:09:08 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v3] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 14:26:01 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Bring back JitRestart event, add codeCacheMaxCapacity Changes requested by mgronlun (Reviewer). src/hotspot/share/jfr/metadata/metadata.xml line 563: > 561: > 562: > 563: Is codeCacheMaxCapacity the current in-use size of the CodeCache, in bytes? It should have the contentType="bytes" in that case. Also "freedMemory" should have the same contentType. src/hotspot/share/jfr/metadata/metadata.xml line 620: > 618: > 619: > 620: contentType="bytes" test/jdk/jdk/jfr/event/metadata/TestLookForUntestedEvents.java line 57: > 55: private static final Set hardToTestEvents = new HashSet<>( > 56: Arrays.asList( > 57: "DataLoss", "IntFlag", "ReservedStackActivation", "JitRestart", Can we create a test? There is an existing test for CodeCacheFull, maybe derive from it to create a test also for the JitRestart event? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From egahlin at openjdk.org Thu Jul 14 17:17:17 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 14 Jul 2022 17:17:17 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v3] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 14:26:01 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Bring back JitRestart event, add codeCacheMaxCapacity src/hotspot/share/code/codeCache.cpp line 1369: > 1367: event.set_unallocatedCapacity(heap->unallocated_capacity()); > 1368: event.set_fullCount(heap->full_count()); > 1369: event.set_codeCacheMaxCapacity(CodeCache::max_capacity()); Add test of field in TestCodeCacheFull src/hotspot/share/jfr/metadata/metadata.xml line 561: > 559: > 560: > 561: "JITRestart" . The convention for other events have been to use capital laters for acronyms. "JIT restart" -> "JIT Restart" src/hotspot/share/jfr/metadata/metadata.xml line 563: > 561: > 562: > 563: Should be "Code Cache Maximum Capacity" src/hotspot/share/jfr/metadata/metadata.xml line 620: > 618: > 619: > 620: Should be "Code Cache Maximum Capacity" src/hotspot/share/runtime/sweeper.cpp line 440: > 438: log.debug("restart compiler"); > 439: log_sweep("restart_compiler"); > 440: EventJitRestart event; It would be good to have a sanity test of the event. If the event can't be provoked reliably, retry or accept as OK. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From hseigel at openjdk.org Thu Jul 14 17:37:02 2022 From: hseigel at openjdk.org (Harold Seigel) Date: Thu, 14 Jul 2022 17:37:02 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v2] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 15:06:49 GMT, Coleen Phillimore wrote: >> I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > second get_user_name_slow call should CHECK_NULL too. The changes look good! Thanks, Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.org/jdk/pull/9492 From coleenp at openjdk.org Thu Jul 14 18:45:04 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Jul 2022 18:45:04 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v2] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 15:06:49 GMT, Coleen Phillimore wrote: >> I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > second get_user_name_slow call should CHECK_NULL too. Thanks Harold! ------------- PR: https://git.openjdk.org/jdk/pull/9492 From lmesnik at openjdk.org Fri Jul 15 00:22:33 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 15 Jul 2022 00:22:33 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop Message-ID: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. ------------- Commit messages: - Merge branch 'master' of https://github.com/openjdk/jdk - fix Changes: https://git.openjdk.org/jdk/pull/9505/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289612 Stats: 123 lines in 8 files changed: 104 ins; 6 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From fgao at openjdk.org Fri Jul 15 01:23:05 2022 From: fgao at openjdk.org (Fei Gao) Date: Fri, 15 Jul 2022 01:23:05 GMT Subject: RFR: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 In-Reply-To: References: Message-ID: <2W6GMUP5HUpdi6E1LvIqTJAdaw5DVBKnPbmNO6FbHZQ=.20619693-2f98-405e-8df1-a9fd35d02cb9@github.com> On Thu, 14 Jul 2022 09:30:26 GMT, Martin Doerr wrote: > Your fix LGTM. The test doesn't show the problem on PPC64, but my original replay file has worked to verify the fix. Thanks for your review and verification, @TheRealMDoerr . ------------- PR: https://git.openjdk.org/jdk/pull/9391 From dholmes at openjdk.org Fri Jul 15 01:46:08 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Jul 2022 01:46:08 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Fri, 15 Jul 2022 00:11:16 GMT, Leonid Mesnik wrote: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Hi Leonid, Can we avoid all the agent stuff by defining a JVMTI helper class in the test library: public class JVMTI { private native void stopThread(Thread t, Throwable ex); public void stopThread(Thread t) { stopThread(t, new ThreadDeath()); } } And in cpp file just use GetEnv to get JVMTIEnv and call stopThread? Otherwise the conversions seem quite reasonable. Thanks. test/hotspot/jtreg/vmTestbase/gc/gctests/mallocWithGC2/mallocWithGC2.java line 116: > 114: > 115: tArray[0].join(); // wait for the javaHeapEater Thread to finish > 116: tArray[1].stop(); // Once javaHeapEater is finished, stop the So without this the other thread will run for a full 3 minutes - is that a concern? ------------- PR: https://git.openjdk.org/jdk/pull/9505 From alanb at openjdk.org Fri Jul 15 07:30:58 2022 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 15 Jul 2022 07:30:58 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Fri, 15 Jul 2022 00:11:16 GMT, Leonid Mesnik wrote: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Thanks for doing this. I agree with David that AsyncExceptionOnMonitorEnter AsyncExceptionTest doesn't really need to be started with -agentlib. The can_signal_thread capability can be obtained in the alive phase so JNI code can obtain a JVMTI environment and add this capability. test/hotspot/jtreg/runtime/Thread/AsyncExceptionTest.java line 58: > 56: internalRun1(); > 57: } catch (ThreadDeath td) { > 58: throw new RuntimeException("Catched ThreadDeath in run() instead of internalRun2() or internalRun1(). receivedThreadDeathinInternal1=" + receivedThreadDeathinInternal1 + "; receivedThreadDeathinInternal2=" + receivedThreadDeathinInternal2); Drive-by comment: the exception messages mean the lines are 240+ characters line and make it impossible to see changes when using side-by-side diffs. Maybe someday it should be trimming down to something sane. test/hotspot/jtreg/vmTestbase/gc/gctests/mallocWithGC2/mallocWithGC2.java line 118: > 116: } catch (Exception e) { > 117: throw new TestFailure("Test Failed.", e); > 118: } Drive-by comment on this source file is that it looks like it uses 8-space indent everywhere, maybe tabs were converted to 8 spaces by mistake? test/hotspot/jtreg/vmTestbase/nsk/stress/stack/stack002.java line 155: > 153: }; > 154: ***/ > 155: tester.stop = true; Can the comment "The test hangs on JDK 1.2.2 Classic VM" be removed? ------------- PR: https://git.openjdk.org/jdk/pull/9505 From fyang at openjdk.org Fri Jul 15 07:44:03 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 15 Jul 2022 07:44:03 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter In-Reply-To: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: On Thu, 14 Jul 2022 07:42:57 GMT, Feilong Jiang wrote: > As [JDK-8288971](https://bugs.openjdk.org/browse/JDK-8288971) described, we have the same issue on riscv backend: > > 1. We use x30 to pass the caller's SP to a callee through adapters. x30 is not a callee-saved register in native ABI [1], we choose x19 for this patch. > 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. > 3. Relate to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > [1]. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > > Additional tests: > - hotspot/jdk tier1 on QEMU with Release JDK > - hotspot tier1 on HiFive Unmatched board with Release JDK > - hotspot tier1 on QEMU with Fastdebug JDK Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/interp_masm_riscv.hpp line 96: > 94: } > 95: > 96: void check_extended_sp(const char* msg = "check extended SP") { Redundant space in RHS of assignment. src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 709: > 707: __ add(esp, esp, - entry_size); > 708: __ mv(t0, sp); > 709: __ sd(t0, Address(fp, frame::interpreter_frame_extended_sp_offset * wordSize)); Why not store sp directly here? Looks like use of t0 here is not necessary. src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 779: > 777: __ sub(t0, sp, t0); > 778: __ andi(t0, t0, -16); > 779: // Store extender SP and mirror typo here: should be "extended" instead of "extender". ------------- PR: https://git.openjdk.org/jdk/pull/9487 From fjiang at openjdk.org Fri Jul 15 08:13:58 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 15 Jul 2022 08:13:58 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter [v2] In-Reply-To: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: > As [JDK-8288971](https://bugs.openjdk.org/browse/JDK-8288971) described, we have the same issue on riscv backend: > > 1. We use x30 to pass the caller's SP to a callee through adapters. x30 is not a callee-saved register in native ABI [1], we choose x19 for this patch. > 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. > 3. Relate to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > [1]. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > > Additional tests: > - hotspot/jdk tier1 on QEMU with Release JDK > - hotspot tier1 on HiFive Unmatched board with Release JDK > - hotspot tier1 on QEMU with Fastdebug JDK Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: fix comment and remove unnecessary move sp to t0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9487/files - new: https://git.openjdk.org/jdk/pull/9487/files/2e259d63..c5f8886a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9487&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9487&range=00-01 Stats: 8 lines in 3 files changed: 0 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/9487.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9487/head:pull/9487 PR: https://git.openjdk.org/jdk/pull/9487 From fjiang at openjdk.org Fri Jul 15 08:14:04 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 15 Jul 2022 08:14:04 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter [v2] In-Reply-To: References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: On Fri, 15 Jul 2022 07:40:24 GMT, Fei Yang wrote: >> Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix comment and remove unnecessary move sp to t0 > > Changes requested by fyang (Reviewer). @RealFYang -- Thank you for the comments! Would you please take another look at the new changes? > src/hotspot/cpu/riscv/interp_masm_riscv.hpp line 96: > >> 94: } >> 95: >> 96: void check_extended_sp(const char* msg = "check extended SP") { > > Redundant space in RHS of assignment. Oops, fixed. > src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 709: > >> 707: __ add(esp, esp, - entry_size); >> 708: __ mv(t0, sp); >> 709: __ sd(t0, Address(fp, frame::interpreter_frame_extended_sp_offset * wordSize)); > > Why not store sp directly here? Looks like use of t0 here is not necessary. Yes, we can store sp here directly. The same issue in other places was fixed too. > src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 779: > >> 777: __ sub(t0, sp, t0); >> 778: __ andi(t0, t0, -16); >> 779: // Store extender SP and mirror > > typo here: should be "extended" instead of "extender". Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/9487 From fjiang at openjdk.org Fri Jul 15 08:48:39 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 15 Jul 2022 08:48:39 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter [v3] In-Reply-To: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: <1QsfeQGx61mALm6prnRzbZnnFv0GpXt7dSRphaO1P1A=.66f707ad-f281-4330-83da-67c4048d64c2@github.com> > As [JDK-8288971](https://bugs.openjdk.org/browse/JDK-8288971) described, we have the same issue on riscv backend: > > 1. We use x30 to pass the caller's SP to a callee through adapters. x30 is not a callee-saved register in native ABI [1], we choose x19 for this patch. > 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. > 3. Relate to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > [1]. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > > Additional tests: > - hotspot/jdk tier1 on QEMU with Release JDK > - hotspot tier1 on HiFive Unmatched board with Release JDK > - hotspot tier1 on QEMU with Fastdebug JDK Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: more unnecessary mv sp to tmp register ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9487/files - new: https://git.openjdk.org/jdk/pull/9487/files/c5f8886a..edf203fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9487&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9487&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9487.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9487/head:pull/9487 PR: https://git.openjdk.org/jdk/pull/9487 From fjiang at openjdk.org Fri Jul 15 08:54:59 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 15 Jul 2022 08:54:59 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter [v4] In-Reply-To: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: > As [JDK-8288971](https://bugs.openjdk.org/browse/JDK-8288971) described, we have the same issue on riscv backend: > > 1. We use x30 to pass the caller's SP to a callee through adapters. x30 is not a callee-saved register in native ABI [1], we choose x19 for this patch. > 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. > 3. Relate to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > [1]. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > > Additional tests: > - hotspot/jdk tier1 on QEMU with Release JDK > - hotspot tier1 on HiFive Unmatched board with Release JDK > - hotspot tier1 on QEMU with Fastdebug JDK Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9487/files - new: https://git.openjdk.org/jdk/pull/9487/files/edf203fd..72c1e4c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9487&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9487&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9487.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9487/head:pull/9487 PR: https://git.openjdk.org/jdk/pull/9487 From tschatzl at openjdk.org Fri Jul 15 11:15:38 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 15 Jul 2022 11:15:38 GMT Subject: RFR: 8290357: Drop HeapRegion::marked_bytes() Message-ID: Hi all, please review this removal of the `HeapRegion::_marked_bytes` member that records the bytes marked below tams in the recent mark as it is not really interesting in the removed places. I added comments to give reasons for this removal in the particular places for your review. Testing: jtreg gc/g1, gha Thanks, Thomas ------------- Commit messages: - initial version, use live_bytes instead of marked_bytes Changes: https://git.openjdk.org/jdk/pull/9511/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9511&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290357 Stats: 47 lines in 8 files changed: 0 ins; 36 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/9511.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9511/head:pull/9511 PR: https://git.openjdk.org/jdk/pull/9511 From tschatzl at openjdk.org Fri Jul 15 11:15:39 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 15 Jul 2022 11:15:39 GMT Subject: RFR: 8290357: Drop HeapRegion::marked_bytes() In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 11:06:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this removal of the `HeapRegion::_marked_bytes` member that records the bytes marked below tams in the recent mark as it is not really interesting in the removed places. I added comments to give reasons for this removal in the particular places for your review. > > Testing: jtreg gc/g1, gha > > Thanks, > Thomas src/hotspot/share/gc/g1/g1ConcurrentRebuildAndScrub.cpp line 90: > 88: > 89: void assert_marked_words(HeapRegion* hr) { > 90: assert((_marked_words * HeapWordSize) == hr->marked_bytes(), This use is for verification/debugging only and has been suggested to remove (the `_marked_words` member) here during review of the single bitmap change. src/hotspot/share/gc/g1/g1RemSetTrackingPolicy.cpp line 76: > 74: BOOL_TO_STR(selected_for_rebuild), > 75: live_bytes, > 76: r->marked_bytes(), At this point (scrubbing start) `marked_bytes()` contain the amount of marked bytes in the _previous_ marking. This is not particularly interesting to see; `live_bytes` contains the current marked bytes below tams and is printed. src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 441: > 439: > 440: void account_failed_region(HeapRegion* r) { > 441: size_t used_words = r->marked_bytes() / HeapWordSize; At this point, for evacuation failure regions, `marked_bytes() == live_bytes()` so it can be replaced without any change. src/hotspot/share/prims/whitebox.cpp line 607: > 605: bool do_heap_region(HeapRegion* r) { > 606: if (r->is_old()) { > 607: size_t prev_live = r->marked_bytes(); I do not think this has ever been intended (this is the marked bytes below TAMS, not including bytes between TAMS and top()), and for a lower estimate of amount of bytes reclaimed, `live_bytes()` is as good or better. ------------- PR: https://git.openjdk.org/jdk/pull/9511 From mbaesken at openjdk.org Fri Jul 15 12:13:48 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 15 Jul 2022 12:13:48 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v4] In-Reply-To: References: Message-ID: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Adjust test, contentType and label info ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9334/files - new: https://git.openjdk.org/jdk/pull/9334/files/9e4a54b1..b29f2f5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=02-03 Stats: 6 lines in 2 files changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/9334.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9334/head:pull/9334 PR: https://git.openjdk.org/jdk/pull/9334 From mbaesken at openjdk.org Fri Jul 15 12:28:04 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 15 Jul 2022 12:28:04 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v4] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 12:13:48 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust test, contentType and label info Hi , I adjusted the contentType as suggested, adjusted the label and TestCodeCacheFull. Regarding triggering the JitRestart event , the suggested TestCodeCacheFull looks like a good start . So I added to allocateCodeBlob (this triggers the CodeCacheFull event) a corresponding freeCodeBlob call . Unfortunately this did not work in practise and doesn't show the JitRestart event. There could be some delay to the restarting unfortunately. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From coleenp at openjdk.org Fri Jul 15 12:45:39 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 15 Jul 2022 12:45:39 GMT Subject: RFR: 8227060: Optimize safepoint cleanup subtask order Message-ID: Most of the analysis in the PR is for code that's removed, but I found one safepoint cleanup task that's unused. Also the dictionary resizing and symbol/string table rehashing, while rare, could take a long time so I moved them sooner in the list. Tested with tier1-3. ------------- Commit messages: - 8227060: Optimize safepoint cleanup subtask order Changes: https://git.openjdk.org/jdk/pull/9515/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9515&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8227060 Stats: 37 lines in 5 files changed: 13 ins; 21 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9515.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9515/head:pull/9515 PR: https://git.openjdk.org/jdk/pull/9515 From dholmes at openjdk.org Fri Jul 15 12:53:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Jul 2022 12:53:04 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v2] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 15:06:49 GMT, Coleen Phillimore wrote: >> I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > second get_user_name_slow call should CHECK_NULL too. Hi Coleen, I'm really not sure this is actually addressing the issues/concerns that were raised in the bug report. The assertion is good but otherwise nothing has changed has it? Every place you changed a THREAD to a CHECK you've changed the existing behaviour of the code. That existing behaviour is dubious because of the missing CHECK but nevertheless it has now been changed, and it isn't always easily discernible exactly how that change will manifest in higher-level code. (It is somewhat disappointing to see so many remaining places that the THREADS/TRAPS cleanup missed :( ). I think I need to study this one further. Thanks. src/hotspot/share/utilities/exceptions.cpp line 360: > 358: incoming_exception = Handle(thread, thread->pending_exception()); > 359: thread->clear_pending_exception(); > 360: incoming_exception->print(); We shouldn't be unconditionally printing in product mode. If we have an unexpected pending exception then that is a bug in the VM code. The end user won't have a clue what is being printed or why, nor what to do about it. src/hotspot/share/utilities/exceptions.cpp line 361: > 359: thread->clear_pending_exception(); > 360: incoming_exception->print(); > 361: ResourceMark rm; rm(thread); ------------- PR: https://git.openjdk.org/jdk/pull/9492 From aph at openjdk.org Fri Jul 15 13:14:02 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Jul 2022 13:14:02 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v6] In-Reply-To: References: Message-ID: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/bc64b79f..1525d272 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=04-05 Stats: 86 lines in 1 file changed: 17 ins; 25 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Fri Jul 15 13:23:00 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Jul 2022 13:23:00 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v7] In-Reply-To: References: Message-ID: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/1525d272..7f18bcdc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=05-06 Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From mbaesken at openjdk.org Fri Jul 15 14:21:06 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 15 Jul 2022 14:21:06 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v5] In-Reply-To: References: Message-ID: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: add test for JitRestart ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9334/files - new: https://git.openjdk.org/jdk/pull/9334/files/b29f2f5c..69559d00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=03-04 Stats: 104 lines in 1 file changed: 104 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9334.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9334/head:pull/9334 PR: https://git.openjdk.org/jdk/pull/9334 From mbaesken at openjdk.org Fri Jul 15 14:22:20 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 15 Jul 2022 14:22:20 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v4] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 12:13:48 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust test, contentType and label info I added a test after some playing around with the WhiteBox functionality. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From coleenp at openjdk.org Fri Jul 15 14:28:58 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 15 Jul 2022 14:28:58 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v5] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 14:21:06 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add test for JitRestart test/jdk/jdk/jfr/event/compiler/TestJitRestart.java line 33: > 31: import jdk.test.lib.jfr.EventNames; > 32: import jdk.test.lib.jfr.Events; > 33: import sun.hotspot.WhiteBox; This package has been removed. Please use jdk.test.whitebox.WhiteBox. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From coleenp at openjdk.org Fri Jul 15 14:37:06 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 15 Jul 2022 14:37:06 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v2] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 12:38:31 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> second get_user_name_slow call should CHECK_NULL too. > > src/hotspot/share/utilities/exceptions.cpp line 360: > >> 358: incoming_exception = Handle(thread, thread->pending_exception()); >> 359: thread->clear_pending_exception(); >> 360: incoming_exception->print(); > > We shouldn't be unconditionally printing in product mode. If we have an unexpected pending exception then that is a bug in the VM code. The end user won't have a clue what is being printed or why, nor what to do about it. Yes, I agree. I was going to add it under -Xlog:exceptions but if you turn on -Xlog:exceptions, you can see where the pending exception comes from so that's not helpful. I removed it. > src/hotspot/share/utilities/exceptions.cpp line 361: > >> 359: thread->clear_pending_exception(); >> 360: incoming_exception->print(); >> 361: ResourceMark rm; > > rm(thread); ok ------------- PR: https://git.openjdk.org/jdk/pull/9492 From adinn at openjdk.org Fri Jul 15 14:39:59 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 15 Jul 2022 14:39:59 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v7] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 13:23:00 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: > > - 8289743: AArch64: Clean up patching logic > - 8289743: AArch64: Clean up patching logic > - 8289743: AArch64: Clean up patching logic src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 106: > 104: // adr/adrp Rx imm21; movk Rx #imm16<<32; ldr/str Ry, [Rx, #offset_in_page] > 105: // adr/adrp Rx imm21; movk Rx #imm16<<32; add Ry, Rx, #offset_in_page > 106: // adr/adrp Rx imm21; movk Rx #imm16<<32 Probably worth adding a note that the patterns with 3 insns occur as targets for retrieval but not for patching. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From coleenp at openjdk.org Fri Jul 15 14:41:17 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 15 Jul 2022 14:41:17 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v2] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 15:06:49 GMT, Coleen Phillimore wrote: >> I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > second get_user_name_slow call should CHECK_NULL too. By changing THREAD to CHECK, I technically didn't change the behavior of the code because the code for new_exception would have thrown the pending exception if it didn't get an OOM allocating the new exception. For the get_u1, get_u2 case, the stream is truncated so the code wasn't going to get much further anyway. The CR complains of throwing the exception that you get while constructing an exception and that was confusing, but it's the right thing to do. If you get an OOM or StackOverflow creating an exception, you want the OOM or StackOverflow to be returned. See CR for more discussion. ------------- PR: https://git.openjdk.org/jdk/pull/9492 From aph at openjdk.org Fri Jul 15 15:05:52 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Jul 2022 15:05:52 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v8] In-Reply-To: References: Message-ID: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/7f18bcdc..94fd755c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=06-07 Stats: 21 lines in 1 file changed: 1 ins; 3 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From jbhateja at openjdk.org Fri Jul 15 15:07:06 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 15 Jul 2022 15:07:06 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 21:03:30 GMT, Jatin Bhateja wrote: >> - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. >> - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8290066: Removing newly added white listed options. Hi @dean-long , @chhagedorn , can you kindly check and approve this. ------------- PR: https://git.openjdk.org/jdk/pull/9452 From coleenp at openjdk.org Fri Jul 15 15:09:07 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 15 Jul 2022 15:09:07 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v3] In-Reply-To: References: Message-ID: > I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: second get_user_name_slow call should CHECK_NULL too. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9492/files - new: https://git.openjdk.org/jdk/pull/9492/files/daf55a4b..fd1bfeb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9492&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9492&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9492.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9492/head:pull/9492 PR: https://git.openjdk.org/jdk/pull/9492 From ngasson at openjdk.org Fri Jul 15 15:47:06 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Fri, 15 Jul 2022 15:47:06 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v8] In-Reply-To: References: Message-ID: <3sd8WCxOmhMotCpZQMCr2fdcfWVegD_NOOL3Su6BtfY=.91ed37c2-4609-48bb-94fe-97009c264a4c@github.com> On Fri, 15 Jul 2022 15:05:52 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic I ran tier1-3 with the latest version and hit a couple of crashes: runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java # Internal Error (/home/ent-user/ci-scripts/jdk_build/jdk_src/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:355), pid=387892, tid=387895 # assert(address_is == target) failed: should be [..snip..] Stack: [0x0000ffff9b56f000,0x0000ffff9b76f000], sp=0x0000ffff9b768c20, free space=2023k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x145fd44] MacroAssembler::pd_patch_instruction_size(unsigned char*, unsigned char*)+0xe4 V [libjvm.so+0x177e5a0] Relocation::pd_set_data_value(unsigned char*, long, bool)+0x40 V [libjvm.so+0x17788c0] external_word_Relocation::fix_relocation_after_move(CodeBuffer const*, CodeBuffer*)+0x8c V [libjvm.so+0xa6c204] CodeBuffer::relocate_code_to(CodeBuffer*) const+0x470 V [libjvm.so+0xa6f3f4] CodeBuffer::copy_code_to(CodeBlob*)+0x90 V [libjvm.so+0x15c33c4] nmethod::nmethod(Method*, CompilerType, int, int, CodeOffsets*, CodeBuffer*, int, ByteSize, ByteSize, OopMapSet*)+0x1e0 compiler/unsafe/UnsafeGetConstantField.java # SIGSEGV (0xb) at pc=0x0000ffff8585c71c, pid=198485, tid=198515 # # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-git-006c68ae4) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-git-006c68ae4, mixed mode, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x17bb71c] ScopeDesc::decode_object_values(int)+0xec # [..snip..] Stack: [0x0000ffff6016d000,0x0000ffff6036d000], sp=0x0000ffff60366e80, free space=2023k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x17bb71c] ScopeDesc::decode_object_values(int)+0xec V [libjvm.so+0x17bb868] ScopeDesc::ScopeDesc(CompiledMethod const*, PcDesc*, bool)+0x48 V [libjvm.so+0x15bec58] nmethod::scope_desc_in(unsigned char*, unsigned char*)+0xe8 V [libjvm.so+0x15c2990] nmethod::decode2(outputStream*) const+0x600 V [libjvm.so+0xb7ae74] disnm+0x164 V [libjvm.so+0x145fd0c] MacroAssembler::pd_patch_instruction_size(unsigned char*, unsigned char*)+0xac V [libjvm.so+0x177e5a0] Relocation::pd_set_data_value(unsigned char*, long, bool)+0x40 V [libjvm.so+0x17788c0] external_word_Relocation::fix_relocation_after_move(CodeBuffer const*, CodeBuffer*)+0x8c ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Fri Jul 15 16:02:03 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Jul 2022 16:02:03 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v9] In-Reply-To: References: Message-ID: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/94fd755c..458a1ff8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=07-08 Stats: 14 lines in 2 files changed: 12 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Fri Jul 15 16:09:03 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Jul 2022 16:09:03 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v8] In-Reply-To: <3sd8WCxOmhMotCpZQMCr2fdcfWVegD_NOOL3Su6BtfY=.91ed37c2-4609-48bb-94fe-97009c264a4c@github.com> References: <3sd8WCxOmhMotCpZQMCr2fdcfWVegD_NOOL3Su6BtfY=.91ed37c2-4609-48bb-94fe-97009c264a4c@github.com> Message-ID: On Fri, 15 Jul 2022 15:43:28 GMT, Nick Gasson wrote: > [..snip..] Please either tell me what's in the [..snip..], or which machine and OS and release/debug build you used. Preferable all of these. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From adinn at openjdk.org Fri Jul 15 16:29:06 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 15 Jul 2022 16:29:06 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v9] In-Reply-To: References: Message-ID: <9p0Y9UynLzRr1yKGgvGW2bRXjAI1sstNOzmWS4hsirQ=.427053f5-6151-4cef-bde4-40bda56d4872@github.com> On Fri, 15 Jul 2022 16:02:03 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: > > - 8289743: AArch64: Clean up patching logic > - 8289743: AArch64: Clean up patching logic > - 8289743: AArch64: Clean up patching logic src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 304: > 302: ptrdiff_t offset = target - insn_addr; > 303: if (inner) { > 304: instructions = 2; inner will always be non null so assert instead of branch? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 313: > 311: } > 312: // movk has handled the upper bits. Now we extract the lower 19 > 313: // bits of the signed offset field for the ADRP. It's not always a movk though is it? ------------- PR: https://git.openjdk.org/jdk/pull/9398 From ngasson at openjdk.org Fri Jul 15 16:47:21 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Fri, 15 Jul 2022 16:47:21 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v8] In-Reply-To: References: <3sd8WCxOmhMotCpZQMCr2fdcfWVegD_NOOL3Su6BtfY=.91ed37c2-4609-48bb-94fe-97009c264a4c@github.com> Message-ID: On Fri, 15 Jul 2022 16:05:26 GMT, Andrew Haley wrote: > > Please either tell me what's in the [..snip..], or which machine and OS and release/debug build you used. Preferable all of these. I mailed them to you. Both can be reproduced on a fastdebug build: make exploded-test TEST="runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java" make exploded-test TEST="compiler/unsafe/UnsafeGetConstantField.java" ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Fri Jul 15 16:47:24 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Jul 2022 16:47:24 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v9] In-Reply-To: <9p0Y9UynLzRr1yKGgvGW2bRXjAI1sstNOzmWS4hsirQ=.427053f5-6151-4cef-bde4-40bda56d4872@github.com> References: <9p0Y9UynLzRr1yKGgvGW2bRXjAI1sstNOzmWS4hsirQ=.427053f5-6151-4cef-bde4-40bda56d4872@github.com> Message-ID: On Fri, 15 Jul 2022 16:25:00 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: >> >> - 8289743: AArch64: Clean up patching logic >> - 8289743: AArch64: Clean up patching logic >> - 8289743: AArch64: Clean up patching logic > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 304: > >> 302: ptrdiff_t offset = target - insn_addr; >> 303: if (inner) { >> 304: instructions = 2; > > inner will always be non null so assert instead of branch? It'll segfault anyway. I don't think we normally assert for such things, but I don't mind if you insist. > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 313: > >> 311: } >> 312: // movk has handled the upper bits. Now we extract the lower 19 >> 313: // bits of the signed offset field for the ADRP. > > It's not always a movk though is it? True, I'll reword it. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Fri Jul 15 17:02:02 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Jul 2022 17:02:02 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v8] In-Reply-To: References: <3sd8WCxOmhMotCpZQMCr2fdcfWVegD_NOOL3Su6BtfY=.91ed37c2-4609-48bb-94fe-97009c264a4c@github.com> Message-ID: <154tZVokMhkUlW4teEJVVhWoI_hhPEBdmaR8pzfOxWQ=.3a8136dd-a898-4fce-9172-d3682ad6ca67@github.com> On Fri, 15 Jul 2022 16:44:56 GMT, Nick Gasson wrote: > I mailed them to you. Both can be reproduced on a fastdebug build: Found it, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Fri Jul 15 17:08:35 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 15 Jul 2022 17:08:35 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v10] In-Reply-To: References: Message-ID: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/458a1ff8..c9eb876d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=08-09 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From dcubed at openjdk.org Fri Jul 15 18:05:17 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 15 Jul 2022 18:05:17 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Fri, 15 Jul 2022 00:11:16 GMT, Leonid Mesnik wrote: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Just a few comments/observations. ------------- Changes requested by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/9505 From dcubed at openjdk.org Fri Jul 15 18:05:19 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 15 Jul 2022 18:05:19 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Fri, 15 Jul 2022 01:29:08 GMT, David Holmes wrote: >> The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. > > test/hotspot/jtreg/vmTestbase/gc/gctests/mallocWithGC2/mallocWithGC2.java line 116: > >> 114: >> 115: tArray[0].join(); // wait for the javaHeapEater Thread to finish >> 116: tArray[1].stop(); // Once javaHeapEater is finished, stop the > > So without this the other thread will run for a full 3 minutes - is that a concern? Looks like the test is using the default timeout value of 120 seconds/2 minutes. With the usual timeoutFactor of 4 (or higher) used by Mach5, this should be okay in that environment, but it might not be if invoked with a timeoutFactor not set (which defaults to 1, IIRC). ------------- PR: https://git.openjdk.org/jdk/pull/9505 From dcubed at openjdk.org Fri Jul 15 18:05:21 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 15 Jul 2022 18:05:21 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Fri, 15 Jul 2022 07:27:34 GMT, Alan Bateman wrote: >> The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. > > test/hotspot/jtreg/vmTestbase/nsk/stress/stack/stack002.java line 155: > >> 153: }; >> 154: ***/ >> 155: tester.stop = true; > > Can the comment "The test hangs on JDK 1.2.2 Classic VM" be removed? The whole commented out block seems like it could be deleted. ------------- PR: https://git.openjdk.org/jdk/pull/9505 From coleenp at openjdk.org Fri Jul 15 20:15:54 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 15 Jul 2022 20:15:54 GMT Subject: RFR: 8227060: Optimize safepoint cleanup subtask order [v2] In-Reply-To: References: Message-ID: > Most of the analysis in the CR is for code that's removed, but I found one safepoint cleanup task that's unused. Also the dictionary resizing and symbol/string table rehashing, while rare, could take a long time so I moved them sooner in the list. > Tested with tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Kim's improvement ideas. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9515/files - new: https://git.openjdk.org/jdk/pull/9515/files/6e5786c9..9e49a30b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9515&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9515&range=00-01 Stats: 23 lines in 1 file changed: 8 ins; 9 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/9515.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9515/head:pull/9515 PR: https://git.openjdk.org/jdk/pull/9515 From kbarrett at openjdk.org Sat Jul 16 00:52:12 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 16 Jul 2022 00:52:12 GMT Subject: RFR: 8227060: Optimize safepoint cleanup subtask order [v2] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 20:15:54 GMT, Coleen Phillimore wrote: >> Most of the analysis in the CR is for code that's removed, but I found one safepoint cleanup task that's unused. Also the dictionary resizing and symbol/string table rehashing, while rare, could take a long time so I moved them sooner in the list. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Kim's improvement ideas. A bit of history here. It seems that some of the point of the original CR has disappeared. There used to be a use of parallel_java_threads_do at the start (from JDK-8180932), with the CR suggesting moving that to the end. That disappeared with JDK-8246476. So some of the point of the CR has gone away. But ordering the remaining tasks from expensive to cheap is still sensible, and the new order mostly [*] looks plausible. A different (pre-existing) problem is that the use of the workers isn't great. The amount of parallelism is just the workgang's current `active_workers()`, with no regard to how much parallelism we have. Presently the maximum useful amount of parallelism is the number of subtasks, so 6 (which might easily and likely be reduced with some pre-checks). So we're going to apply `active_workers` threads (whatever that happens to be at the moment) to a task which can use only a relatively small and fixed [*] number of threads. Improving the use of the workgang should be a separate RFE. [*] JDK-8253180 later (after JDK-8246476) introduced the serial threads_do to set GC watermarks. Digging into it a bit, that doesn't look obviously lightweight; I wonder if it should be (should have been) parallelized (and placed at the end of the work). But that's a separate RFE. src/hotspot/share/runtime/safepoint.cpp line 522: > 520: private: > 521: SubTasksDone _subtasks; > 522: uint _num_workers; [pre-existing] `_num_workers` seems to no longer be used. src/hotspot/share/runtime/safepoint.cpp line 541: > 539: }; > 540: > 541: class SafepointCleanupThreadClosure : public ThreadClosure { The new name doesn't seem to have much more to do with what it does than did the old name. The class definition could be moved to the single point of use and just called "Closure" to avoid needing to come up with a good name :) ------------- PR: https://git.openjdk.org/jdk/pull/9515 From sviswanathan at openjdk.org Sat Jul 16 01:04:03 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 16 Jul 2022 01:04:03 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 21:03:30 GMT, Jatin Bhateja wrote: >> - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. >> - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8290066: Removing newly added white listed options. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.org/jdk/pull/9452 From jbhateja at openjdk.org Sat Jul 16 01:20:08 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 16 Jul 2022 01:20:08 GMT Subject: RFR: 8290066: Remove KNL specific handling for new CPU target check in IR annotation [v2] In-Reply-To: References: Message-ID: On Sat, 16 Jul 2022 01:00:12 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8290066: Removing newly added white listed options. > > Looks good to me. Thanks @sviswa7 , @vnkozlov ------------- PR: https://git.openjdk.org/jdk/pull/9452 From jbhateja at openjdk.org Sat Jul 16 01:22:10 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 16 Jul 2022 01:22:10 GMT Subject: Integrated: 8290066: Remove KNL specific handling for new CPU target check in IR annotation In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 12:55:02 GMT, Jatin Bhateja wrote: > - Newly added annotations query the CPU feature using white box API which returns the list of features enabled during VM initialization. > - With JVM flag UseKNLSetting, during VM initialization AVX512 features not supported by KNL target are disabled, thus we do not need any special handling for KNL in newly introduced IR annotations (applyCPUFeature, applyCPUFeatureOr, applyCPUFeatureAnd). > > Please review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 2342684f Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/2342684f2cd91a2e5f43dd271e95836aa78e7d0a Stats: 190 lines in 4 files changed: 81 ins; 108 del; 1 mod 8290066: Remove KNL specific handling for new CPU target check in IR annotation Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/9452 From coleenp at openjdk.org Sat Jul 16 15:22:56 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 16 Jul 2022 15:22:56 GMT Subject: RFR: 8227060: Optimize safepoint cleanup subtask order [v3] In-Reply-To: References: Message-ID: > Most of the analysis in the CR is for code that's removed, but I found one safepoint cleanup task that's unused. Also the dictionary resizing and symbol/string table rehashing, while rare, could take a long time so I moved them sooner in the list. > Tested with tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: More Kim improvements. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9515/files - new: https://git.openjdk.org/jdk/pull/9515/files/9e49a30b..c47551fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9515&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9515&range=01-02 Stats: 21 lines in 1 file changed: 7 ins; 12 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9515.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9515/head:pull/9515 PR: https://git.openjdk.org/jdk/pull/9515 From coleenp at openjdk.org Sat Jul 16 15:26:53 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 16 Jul 2022 15:26:53 GMT Subject: RFR: 8227060: Optimize safepoint cleanup subtask order [v2] In-Reply-To: References: Message-ID: On Sat, 16 Jul 2022 00:50:09 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Kim's improvement ideas. > > A bit of history here. It seems that some of the point of the original CR has > disappeared. There used to be a use of parallel_java_threads_do at the start > (from JDK-8180932), with the CR suggesting moving that to the end. That > disappeared with JDK-8246476. So some of the point of the CR has gone away. > But ordering the remaining tasks from expensive to cheap is still sensible, > and the new order mostly [*] looks plausible. > > A different (pre-existing) problem is that the use of the workers isn't great. > The amount of parallelism is just the workgang's current `active_workers()`, > with no regard to how much parallelism we have. Presently the maximum useful > amount of parallelism is the number of subtasks, so 6 (which might easily and > likely be reduced with some pre-checks). So we're going to apply > `active_workers` threads (whatever that happens to be at the moment) to a task > which can use only a relatively small and fixed [*] number of threads. > Improving the use of the workgang should be a separate RFE. > > [*] JDK-8253180 later (after JDK-8246476) introduced the serial threads_do to > set GC watermarks. Digging into it a bit, that doesn't look obviously > lightweight; I wonder if it should be (should have been) parallelized (and > placed at the end of the work). But that's a separate RFE. @kimbarrett I made your suggested improvements, even though a class in the middle of function looks odd to me. Maybe someday it can be replaced with a lambda. Reran tier1 tests locally. I agree that the problem that this was supposed to solve may be gone now, but if we do resizing and rehashing, it would be good to have that in parallel with the other tasks. The threads are already created, otherwise the cost of creating them would not be worth making this parallel. I can't comment on how much work the lazy stack watermark processing is. ------------- PR: https://git.openjdk.org/jdk/pull/9515 From kbarrett at openjdk.org Sun Jul 17 18:38:05 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 17 Jul 2022 18:38:05 GMT Subject: RFR: 8227060: Optimize safepoint cleanup subtask order [v3] In-Reply-To: References: Message-ID: On Sat, 16 Jul 2022 15:22:56 GMT, Coleen Phillimore wrote: >> Most of the analysis in the CR is for code that's removed, but I found one safepoint cleanup task that's unused. Also the dictionary resizing and symbol/string table rehashing, while rare, could take a long time so I moved them sooner in the list. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > More Kim improvements. Looks good. There may be a couple of followup RFEs to be filed; we can discuss those later. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/9515 From lmesnik at openjdk.org Sun Jul 17 20:37:04 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 20:37:04 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v2] In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: <_3tP6vFxkkqiSKOuGocHk9-v32KsXOCH2ZsPuZQahYQ=.fdf5b556-7d4a-4b32-8f1b-1479ff0dc969@github.com> > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: runtime/Thread fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9505/files - new: https://git.openjdk.org/jdk/pull/9505/files/d2150550..88a6f539 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=00-01 Stats: 109 lines in 5 files changed: 4 ins; 98 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 20:41:19 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 20:41:19 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v3] In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: added lib files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9505/files - new: https://git.openjdk.org/jdk/pull/9505/files/88a6f539..b22083b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=01-02 Stats: 103 lines in 2 files changed: 103 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 20:58:02 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 20:58:02 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v4] In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: headers fixed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9505/files - new: https://git.openjdk.org/jdk/pull/9505/files/b22083b6..a172ce75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=02-03 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 21:59:21 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 21:59:21 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v4] In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Fri, 15 Jul 2022 17:56:35 GMT, Daniel D. Daugherty wrote: >> test/hotspot/jtreg/vmTestbase/gc/gctests/mallocWithGC2/mallocWithGC2.java line 116: >> >>> 114: >>> 115: tArray[0].join(); // wait for the javaHeapEater Thread to finish >>> 116: tArray[1].stop(); // Once javaHeapEater is finished, stop the >> >> So without this the other thread will run for a full 3 minutes - is that a concern? > > Looks like the test is using the default timeout value of 120 seconds/2 minutes. > With the usual timeoutFactor of 4 (or higher) used by Mach5, this should be > okay in that environment, but it might not be if invoked with a timeoutFactor > not set (which defaults to 1, IIRC). VM doesn't wait for the completion of threads that are executed native code. Seems like a new bug? ------------- PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 22:07:23 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 22:07:23 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v5] In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: - long line fixed - stack fixed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9505/files - new: https://git.openjdk.org/jdk/pull/9505/files/a172ce75..bd1af178 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=03-04 Stats: 33 lines in 2 files changed: 4 ins; 17 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 22:11:35 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 22:11:35 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v5] In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: <4WY5Ca1r7rRPY85MRVTRfAZrRWtqGOwNu0d5QyZvP_8=.b0cf2f6b-08b8-4b15-8c2e-7ea2bbc2e637@github.com> On Sun, 17 Jul 2022 22:07:23 GMT, Leonid Mesnik wrote: >> The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. > > Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: > > - long line fixed > - stack fixed Fixed stopThread using and other comments (or replied to them). ------------- PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 22:11:36 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 22:11:36 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v5] In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Fri, 15 Jul 2022 07:15:49 GMT, Alan Bateman wrote: >> Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: >> >> - long line fixed >> - stack fixed > > test/hotspot/jtreg/vmTestbase/gc/gctests/mallocWithGC2/mallocWithGC2.java line 118: > >> 116: } catch (Exception e) { >> 117: throw new TestFailure("Test Failed.", e); >> 118: } > > Drive-by comment on this source file is that it looks like it uses 8-space indent everywhere, maybe tabs were converted to 8 spaces by mistake? there are a lot of old tests with the 8-spaces tab. I don't want to fix it in this bugfix. Makes sense? ------------- PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 22:23:08 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 22:23:08 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v6] In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: Update libAsyncExceptionOnMonitorEnter.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9505/files - new: https://git.openjdk.org/jdk/pull/9505/files/bd1af178..3f8c176b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=04-05 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 22:33:05 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 22:33:05 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v7] In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: <_qStQA2nkK_AdzXE6GFzLWp42uX7Wl6H_7E-MLscZ8k=.708531ea-cc04-4f00-bf71-7e443b729532@github.com> > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: upd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9505/files - new: https://git.openjdk.org/jdk/pull/9505/files/3f8c176b..8f482f36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=05-06 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Sun Jul 17 22:35:30 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 17 Jul 2022 22:35:30 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v8] In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: u ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9505/files - new: https://git.openjdk.org/jdk/pull/9505/files/8f482f36..628ff010 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From dholmes at openjdk.org Sun Jul 17 23:03:37 2022 From: dholmes at openjdk.org (David Holmes) Date: Sun, 17 Jul 2022 23:03:37 GMT Subject: [jdk19] RFR: 8278274: Update nroff pages in JDK 19 before RC Message-ID: Please review these changes to the nroff manpage files so that they match their markdown sources that Oracle maintains. All pages at a minimum have 19-ea replaced with 19, and copyright set to 2022 if needed. Additionally: The Java manpage was missing updates from: - [JDK-8282018](https://bugs.openjdk.org/browse/JDK-8282018): Add captions to tables on java man page. The Java manpage has slight formatting differences from: - [JDK-8262004](https://bugs.openjdk.org/browse/JDK-8262004): Classpath separator: Man page says semicolon; should be colon on Linux - [JDK-8236569](https://bugs.openjdk.org/browse/JDK-8236569): -Xss not multiple of 4K does not work for the main thread on macOS The Java manpage has a typo fixed in mainline by [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) (for JDK 20) The keytool manpage was missing updates from: - [JDK-8282014](https://bugs.openjdk.org/browse/JDK-8282014): Add captions to tables on keytool man page. - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA The jar manpage was missing updates from: - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) The jarsigner manpage was missing updates from: - [JDK-8282015](https://bugs.openjdk.org/browse/JDK-8282015): Add captions to tables on jarsigner man page. - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA The javadoc manpage was missing updates from: - [JDK-8279034](https://bugs.openjdk.org/browse/JDK-8279034): Update man page for javadoc `--date` option The jmod manpage was missing updates from: - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) The jpackage manpage was missing updates from: - [JDK-8285146](https://bugs.openjdk.org/browse/JDK-8285146): Document jpackage resource dir feature - [JDK-8284695](https://bugs.openjdk.org/browse/JDK-8284695): Update jpackage man pages for JDK 19 - [JDK-8284209](https://bugs.openjdk.org/browse/JDK-8284209): Replace remaining usages of 'a the' in source code The jshell manpage was missing updates from: - [JDK-8282016](https://bugs.openjdk.org/browse/JDK-8282016): Add captions to tables on jshell man page. ------------- Commit messages: - 8278274: Update nroff pages in JDK 19 before RC Changes: https://git.openjdk.org/jdk19/pull/145/files Webrev: https://webrevs.openjdk.org/?repo=jdk19&pr=145&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8278274 Stats: 515 lines in 28 files changed: 431 ins; 16 del; 68 mod Patch: https://git.openjdk.org/jdk19/pull/145.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/145/head:pull/145 PR: https://git.openjdk.org/jdk19/pull/145 From dholmes at openjdk.org Mon Jul 18 01:16:36 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Jul 2022 01:16:36 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v8] In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Sun, 17 Jul 2022 22:35:30 GMT, Leonid Mesnik wrote: >> The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > u Thanks for the library update -looks good! A couple of minor typo "targets of opportunity" listed below. One discussion still somewhat open. Thanks. test/hotspot/jtreg/runtime/Thread/AsyncExceptionOnMonitorEnter.java line 42: > 40: private static int TEST_MODE = 0; > 41: > 42: public static native int stopThread(Thread thread); This seems unused now. test/hotspot/jtreg/runtime/Thread/AsyncExceptionTest.java line 57: > 55: internalRun1(); > 56: } catch (ThreadDeath td) { > 57: throw new RuntimeException("Catched ThreadDeath in run() instead of internalRun2() or internalRun1().\n" existing: s/Catched/Caught/ test/hotspot/jtreg/runtime/Thread/AsyncExceptionTest.java line 65: > 63: > 64: if (receivedThreadDeathinInternal2 == false && receivedThreadDeathinInternal1 == false) { > 65: throw new RuntimeException("Didn't catched ThreadDeath in internalRun2() nor in internalRun1().\n" Existing: s/catched/catch/ ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9505 From dholmes at openjdk.org Mon Jul 18 01:16:37 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Jul 2022 01:16:37 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v8] In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Sun, 17 Jul 2022 21:55:47 GMT, Leonid Mesnik wrote: >> Looks like the test is using the default timeout value of 120 seconds/2 minutes. >> With the usual timeoutFactor of 4 (or higher) used by Mach5, this should be >> okay in that environment, but it might not be if invoked with a timeoutFactor >> not set (which defaults to 1, IIRC). > > VM doesn't wait for the completion of threads that are executed native code. Seems like a new bug? @lmesnik I don't understand what you mean here. The cHeapEater thread is a non-daemon thread so the test program won't terminate until it does. In the old code we would assist it to terminate early using stop() but in the new code it will run until it completes normally after 180 seconds. ------------- PR: https://git.openjdk.org/jdk/pull/9505 From fyang at openjdk.org Mon Jul 18 01:24:48 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 18 Jul 2022 01:24:48 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter [v4] In-Reply-To: References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: On Fri, 15 Jul 2022 08:54:59 GMT, Feilong Jiang wrote: >> As [JDK-8288971](https://bugs.openjdk.org/browse/JDK-8288971) described, we have the same issue on riscv backend: >> >> 1. We use x30 to pass the caller's SP to a callee through adapters. x30 is not a callee-saved register in native ABI [1], we choose x19 for this patch. >> 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. >> 3. Relate to 1, we should clearly label all the places where the caller's SP is passed to a callee. >> >> [1]. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc >> >> Additional tests: >> - hotspot/jdk tier1 on QEMU with Release JDK >> - hotspot tier1 on HiFive Unmatched board with Release JDK >> - hotspot tier1 on QEMU with Fastdebug JDK >> - jtreg full on QEMU with Release JDK > > Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: > > fix Updated changes looks good. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/9487 From fjiang at openjdk.org Mon Jul 18 01:34:02 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 18 Jul 2022 01:34:02 GMT Subject: RFR: 8290280: riscv: Clean up stack and register handling in interpreter [v4] In-Reply-To: References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: On Mon, 18 Jul 2022 01:22:04 GMT, Fei Yang wrote: >> Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix > > Updated changes looks good. @RealFYang Thanks? ------------- PR: https://git.openjdk.org/jdk/pull/9487 From fjiang at openjdk.org Mon Jul 18 02:15:19 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 18 Jul 2022 02:15:19 GMT Subject: Integrated: 8290280: riscv: Clean up stack and register handling in interpreter In-Reply-To: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> References: <5GXQO3Z1NGxEo0DWgJ8wY17TAX5Tsxo0ZPSpgEwZzlw=.10de92f0-c879-4396-9573-4fe82f52eb24@github.com> Message-ID: On Thu, 14 Jul 2022 07:42:57 GMT, Feilong Jiang wrote: > As [JDK-8288971](https://bugs.openjdk.org/browse/JDK-8288971) described, we have the same issue on riscv backend: > > 1. We use x30 to pass the caller's SP to a callee through adapters. x30 is not a callee-saved register in native ABI [1], we choose x19 for this patch. > 2. We frequently recalculate the location where the native SP needs to go. We have a spare slot in the interpreter frame, so we should calculate it once, when the frame is created, and use it. > 3. Relate to 1, we should clearly label all the places where the caller's SP is passed to a callee. > > [1]. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > > Additional tests: > - hotspot/jdk tier1 on QEMU with Release JDK > - hotspot tier1 on HiFive Unmatched board with Release JDK > - hotspot tier1 on QEMU with Fastdebug JDK > - jtreg full on QEMU with Release JDK This pull request has now been integrated. Changeset: 4dd236b4 Author: Feilong Jiang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/4dd236b40abfeb1200e884021b90226046bc4b85 Stats: 144 lines in 11 files changed: 66 ins; 35 del; 43 mod 8290280: riscv: Clean up stack and register handling in interpreter Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/9487 From fgao at openjdk.org Mon Jul 18 05:58:49 2022 From: fgao at openjdk.org (Fei Gao) Date: Mon, 18 Jul 2022 05:58:49 GMT Subject: Integrated: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 07:51:01 GMT, Fei Gao wrote: > Superword doesn't vectorize any nodes of non-primitive types and > thus sets `allow_address` false when calling type2aelembytes() in > SuperWord::data_size()[1]. Therefore, when we try to resolve the > data size for a node of T_ADDRESS type, the assertion in > type2aelembytes()[2] takes effect. > > We try to resolve the data sizes for node s and node t in the > SuperWord::adjust_alignment_for_type_conversion()[3] when type > conversion between different data sizes happens. The issue is, > when node s is a ConvI2L node and node t is an AddP node of > T_ADDRESS type, type2aelembytes() will assert. To fix it, we > should filter out all non-primitive nodes, like the patch does > in SuperWord::adjust_alignment_for_type_conversion(). Since > it's a failure in the mid-end, all superword available platforms > are affected. In my local test, this failure can be reproduced > on both x86 and aarch64. With this patch, the failure can be fixed. > > Apart from fixing the bug, the patch also adds necessary type check > and does some clean-up in SuperWord::longer_type_for_conversion() > and VectorCastNode::implemented(). > > [1]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1417 > [2]https://github.com/openjdk/jdk/blob/b96ba19807845739b36274efb168dd048db819a3/src/hotspot/share/utilities/globalDefinitions.cpp#L326 > [3]https://github.com/openjdk/jdk/blob/dddd4e7c81fccd82b0fd37ea4583ce1a8e175919/src/hotspot/share/opto/superword.cpp#L1454 This pull request has now been integrated. Changeset: 87340fd5 Author: Fei Gao Committer: Ningsheng Jian URL: https://git.openjdk.org/jdk/commit/87340fd5408d89d9343541ff4fcabde83548a598 Stats: 116 lines in 5 files changed: 89 ins; 9 del; 18 mod 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091 Reviewed-by: kvn, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/9391 From dholmes at openjdk.org Mon Jul 18 06:34:48 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Jul 2022 06:34:48 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v3] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 15:09:07 GMT, Coleen Phillimore wrote: >> I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > second get_user_name_slow call should CHECK_NULL too. You have changed behaviour - even if things are actually more correct. For example given: ``` if (tag == ITEM_Object) { u2 class_index = _stream->get_u2(CHECK_NT); int nconstants = _cp->length(); if ((class_index <= 0 || class_index >= nconstants) || (!_cp->tag_at(class_index).is_klass() && !_cp->tag_at(class_index).is_unresolved_klass())) { _stream->stackmap_format_error("bad class index", THREAD); return VerificationType::bogus_type(); } If the `get_u2` now fails due to classfile truncation then we will throw that exception instead of the "bad class index" fomrat error. ------------- PR: https://git.openjdk.org/jdk/pull/9492 From mbaesken at openjdk.org Mon Jul 18 07:22:00 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 18 Jul 2022 07:22:00 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v6] In-Reply-To: References: Message-ID: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Matthias Baesken has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8289524 - Remove JitRestart from untested events list - add test for JitRestart - Adjust test, contentType and label info - Bring back JitRestart event, add codeCacheMaxCapacity - Incorporate JIT compiler restart into EventSweepCodeCache - JDK-8289524 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9334/files - new: https://git.openjdk.org/jdk/pull/9334/files/69559d00..61d9d902 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=04-05 Stats: 8713 lines in 253 files changed: 5494 ins; 1823 del; 1396 mod Patch: https://git.openjdk.org/jdk/pull/9334.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9334/head:pull/9334 PR: https://git.openjdk.org/jdk/pull/9334 From mbaesken at openjdk.org Mon Jul 18 08:11:59 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 18 Jul 2022 08:11:59 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v7] In-Reply-To: References: Message-ID: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: WhiteBox renaming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9334/files - new: https://git.openjdk.org/jdk/pull/9334/files/61d9d902..085ab42f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9334.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9334/head:pull/9334 PR: https://git.openjdk.org/jdk/pull/9334 From lucy at openjdk.org Mon Jul 18 08:12:03 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 18 Jul 2022 08:12:03 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v3] In-Reply-To: References: Message-ID: <6Rs7_67Yv0ipmcTbHGLxhF9cWlJLtUumWGBbQQk64BM=.29b83d56-1c12-4626-8636-81c44cf684fe@github.com> On Thu, 14 Jul 2022 17:03:14 GMT, Markus Gr?nlund wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> Bring back JitRestart event, add codeCacheMaxCapacity > > src/hotspot/share/jfr/metadata/metadata.xml line 563: > >> 561: >> 562: >> 563: > > Is codeCacheMaxCapacity the current in-use size of the CodeCache, in bytes? It should have the contentType="bytes" in that case. Also "freedMemory" should have the same contentType. Yes, codeCacheMaxCapacity is given in bytes. It specifies the maximum size, not the current in-use size, of the CodeCache. It is the sum over all CodeHeap segments. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From mbaesken at openjdk.org Mon Jul 18 08:12:06 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 18 Jul 2022 08:12:06 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v5] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 14:25:47 GMT, Coleen Phillimore wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> add test for JitRestart > > test/jdk/jdk/jfr/event/compiler/TestJitRestart.java line 33: > >> 31: import jdk.test.lib.jfr.EventNames; >> 32: import jdk.test.lib.jfr.Events; >> 33: import sun.hotspot.WhiteBox; > > This package has been removed. Please use jdk.test.whitebox.WhiteBox. Hi Coleen thanks for the advice. After switching to the new package jdk.test.whitebox.WhiteBox I get java.lang.UnsatisfiedLinkError: 'void sun.hotspot.WhiteBox.registerNatives()' Do I need to do more than just renaming the package ? ------------- PR: https://git.openjdk.org/jdk/pull/9334 From mbaesken at openjdk.org Mon Jul 18 08:15:29 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 18 Jul 2022 08:15:29 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v8] In-Reply-To: References: Message-ID: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: BlobType got a new package too ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9334/files - new: https://git.openjdk.org/jdk/pull/9334/files/085ab42f..a9f72afa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9334&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9334.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9334/head:pull/9334 PR: https://git.openjdk.org/jdk/pull/9334 From adinn at openjdk.org Mon Jul 18 09:07:39 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 18 Jul 2022 09:07:39 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v9] In-Reply-To: References: <9p0Y9UynLzRr1yKGgvGW2bRXjAI1sstNOzmWS4hsirQ=.427053f5-6151-4cef-bde4-40bda56d4872@github.com> Message-ID: On Fri, 15 Jul 2022 16:41:34 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 304: >> >>> 302: ptrdiff_t offset = target - insn_addr; >>> 303: if (inner) { >>> 304: instructions = 2; >> >> inner will always be non null so assert instead of branch? > > It'll segfault anyway. I don't think we normally assert for such things, but I don't mind if you insist. The rationale for replacing the if with an assert was as much to clarify expectations as to guard against errors. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From mbaesken at openjdk.org Mon Jul 18 09:16:04 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 18 Jul 2022 09:16:04 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v5] In-Reply-To: References: Message-ID: On Mon, 18 Jul 2022 08:07:42 GMT, Matthias Baesken wrote: >> test/jdk/jdk/jfr/event/compiler/TestJitRestart.java line 33: >> >>> 31: import jdk.test.lib.jfr.EventNames; >>> 32: import jdk.test.lib.jfr.Events; >>> 33: import sun.hotspot.WhiteBox; >> >> This package has been removed. Please use jdk.test.whitebox.WhiteBox. > > Hi Coleen thanks for the advice. After switching to the new package jdk.test.whitebox.WhiteBox I get > java.lang.UnsatisfiedLinkError: 'void sun.hotspot.WhiteBox.registerNatives()' > Do I need to do more than just renaming the package ? Had to change BlobType import too, this one is also in a new package. ------------- PR: https://git.openjdk.org/jdk/pull/9334 From aph at openjdk.org Mon Jul 18 10:13:59 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 18 Jul 2022 10:13:59 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v11] In-Reply-To: References: Message-ID: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/c9eb876d..12556196 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=09-10 Stats: 27 lines in 1 file changed: 16 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From shade at openjdk.org Mon Jul 18 10:14:03 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 18 Jul 2022 10:14:03 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: References: Message-ID: <9u8Xm0dpTJPt_1-RHkVe4F6m6ca2jWuGdxn-PXfVZ5Y=.84109ac5-fb77-47c9-a706-6422242052eb@github.com> On Tue, 12 Jul 2022 08:14:05 GMT, Fei Yang wrote: > Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. > > We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. > > Testing: tier1 tested on riscv64-linux unmatched board. Looks fine, with one minor question. src/hotspot/cpu/riscv/c1_LIRAssembler_arraycopy_riscv.cpp line 60: > 58: #ifndef PRODUCT > 59: if (PrintC1Statistics) { > 60: __ incrementw(ExternalAddress((address)&Runtime1::_generic_arraycopystub_cnt)); Is it really `incrementw`, though? These counter fields are `int`-s, are they still 32-bit on RISC-V? If so, shouldn't it be `incrementl`? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/9461 From aph at openjdk.org Mon Jul 18 10:37:13 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 18 Jul 2022 10:37:13 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v9] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 16:02:03 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: > > - 8289743: AArch64: Clean up patching logic > - 8289743: AArch64: Clean up patching logic > - 8289743: AArch64: Clean up patching logic src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 461: > 459: uint32_t insn = ((uint32_t*)insn_addr)[0]; > 460: int *insn3_addr = &((int*)insn_addr)[2]; > 461: uint32_t insn3 = (uint32_t)SafeFetch32(insn3_addr, -1); I'm wondering if this is safe. Maybe something like `adrp;movk` could be followed by not-an-instruction which looked like an offset. However, I think that's impossible because anything following would be executed immediately following the `movk`. For the same reason, the `adrp;movk` can't be at the very end of an executable page, so I suppose the use of `SafeFetch32` is unnecessary too. It's still a code smell, though, even if it is safe. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Mon Jul 18 11:07:00 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 18 Jul 2022 11:07:00 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v12] In-Reply-To: References: Message-ID: <7pvfrFUYE8ASMzhQVEpoCJosqf2xu2XJVwrozZer2xQ=.67e4d8ec-9874-451d-a34e-3a7b23bafda5@github.com> > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - 8289743: AArch64: Clean up patching logic - 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/12556196..7b647d17 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=10-11 Stats: 29 lines in 1 file changed: 10 ins; 2 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From fyang at openjdk.org Mon Jul 18 11:38:48 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 18 Jul 2022 11:38:48 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: <9u8Xm0dpTJPt_1-RHkVe4F6m6ca2jWuGdxn-PXfVZ5Y=.84109ac5-fb77-47c9-a706-6422242052eb@github.com> References: <9u8Xm0dpTJPt_1-RHkVe4F6m6ca2jWuGdxn-PXfVZ5Y=.84109ac5-fb77-47c9-a706-6422242052eb@github.com> Message-ID: <1UN6Yhfh84XWZ-8z8QXIxc0gk5sdgdFfQH4iuVibWDw=.86c93248-1c8a-4b6c-aba7-91432fd9bb72@github.com> On Mon, 18 Jul 2022 10:04:03 GMT, Aleksey Shipilev wrote: >> Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. >> >> We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. >> >> Testing: tier1 tested on riscv64-linux unmatched board. > > src/hotspot/cpu/riscv/c1_LIRAssembler_arraycopy_riscv.cpp line 60: > >> 58: #ifndef PRODUCT >> 59: if (PrintC1Statistics) { >> 60: __ incrementw(ExternalAddress((address)&Runtime1::_generic_arraycopystub_cnt)); > > Is it really `incrementw`, though? These counter fields are `int`-s, are they still 32-bit on RISC-V? If so, shouldn't it be `incrementl`? @shipilev : Yes, the type of _generic_arraycopystub_cnt is int and it occupies 32-bit in memory on Linux/RISC-V. That's why we use incrementw here which increments a 32-bit memory operand. Note that incrementl works for 64-bit memory operand. Hope that explains. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9461 From shade at openjdk.org Mon Jul 18 11:46:58 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 18 Jul 2022 11:46:58 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: <1UN6Yhfh84XWZ-8z8QXIxc0gk5sdgdFfQH4iuVibWDw=.86c93248-1c8a-4b6c-aba7-91432fd9bb72@github.com> References: <9u8Xm0dpTJPt_1-RHkVe4F6m6ca2jWuGdxn-PXfVZ5Y=.84109ac5-fb77-47c9-a706-6422242052eb@github.com> <1UN6Yhfh84XWZ-8z8QXIxc0gk5sdgdFfQH4iuVibWDw=.86c93248-1c8a-4b6c-aba7-91432fd9bb72@github.com> Message-ID: On Mon, 18 Jul 2022 11:35:09 GMT, Fei Yang wrote: >> src/hotspot/cpu/riscv/c1_LIRAssembler_arraycopy_riscv.cpp line 60: >> >>> 58: #ifndef PRODUCT >>> 59: if (PrintC1Statistics) { >>> 60: __ incrementw(ExternalAddress((address)&Runtime1::_generic_arraycopystub_cnt)); >> >> Is it really `incrementw`, though? These counter fields are `int`-s, are they still 32-bit on RISC-V? If so, shouldn't it be `incrementl`? > > @shipilev : Yes, the type of _generic_arraycopystub_cnt is int and it occupies 32-bit in memory on Linux/RISC-V. That's why we use incrementw here which increments a 32-bit memory operand. Note that incrementl works for 64-bit memory operand. Hope that explains. Thanks. Is this RISC-V specific postfix naming? On x86, there is `b`/`w`/`l`/`q` for 1/2/4/8-byte ops, respectively. This is just my curiosity, it does not block the integration. ------------- PR: https://git.openjdk.org/jdk/pull/9461 From jvernee at openjdk.org Mon Jul 18 12:43:11 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 18 Jul 2022 12:43:11 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows Message-ID: This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. Out of the ~1100 files that make up hotspot on Windows x64, ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files (mostly linux/posix specific files), so I wanted to gather feedback on this approach before continuing with that. --- To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: PRAGMA_DIAG_PUSH PRAGMA_ALLOW_LOSSY_CONVERSIONS And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: PRAGMA_DIAG_POP 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 ------------- Commit messages: - Rest of the tests - More test - AArch64 - Disable for tests - Fix apostrophe - Last few manual - Automatics - WIP - More disabled warnings - Enable narrow conversion warnings Changes: https://git.openjdk.org/jdk/pull/9516/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9516&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290373 Stats: 1586 lines in 318 files changed: 1579 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9516.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9516/head:pull/9516 PR: https://git.openjdk.org/jdk/pull/9516 From jwaters at openjdk.org Mon Jul 18 12:43:12 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 18 Jul 2022 12:43:12 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows In-Reply-To: References: Message-ID: <6jYHwEXQDqzoCOx7G0x9td35oeXTqQAbFhbnw9FhiV4=.089830f6-ac2f-4621-b313-e3e5ac584f96@github.com> On Fri, 15 Jul 2022 13:25:57 GMT, Jorn Vernee wrote: > This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. > > Out of the ~1100 files that make up hotspot on Windows x64, ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. > > Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. > > I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files (mostly linux/posix specific files), so I wanted to gather feedback on this approach before continuing with that. > > --- > > To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: > > PRAGMA_DIAG_PUSH > PRAGMA_ALLOW_LOSSY_CONVERSIONS > > And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: > > PRAGMA_DIAG_POP > > 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. > > [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 Small question: What's the equivalent option to disable the same warning for gcc? (slightly related to https://bugs.openjdk.org/browse/JDK-8288293) ------------- PR: https://git.openjdk.org/jdk/pull/9516 From jvernee at openjdk.org Mon Jul 18 12:43:13 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 18 Jul 2022 12:43:13 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows In-Reply-To: <6jYHwEXQDqzoCOx7G0x9td35oeXTqQAbFhbnw9FhiV4=.089830f6-ac2f-4621-b313-e3e5ac584f96@github.com> References: <6jYHwEXQDqzoCOx7G0x9td35oeXTqQAbFhbnw9FhiV4=.089830f6-ac2f-4621-b313-e3e5ac584f96@github.com> Message-ID: On Sun, 17 Jul 2022 12:54:32 GMT, Julian Waters wrote: >> This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. >> >> Out of the ~1100 files that make up hotspot on Windows x64, ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. >> >> Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. >> >> I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files (mostly linux/posix specific files), so I wanted to gather feedback on this approach before continuing with that. >> >> --- >> >> To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: >> >> PRAGMA_DIAG_PUSH >> PRAGMA_ALLOW_LOSSY_CONVERSIONS >> >> And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: >> >> PRAGMA_DIAG_POP >> >> 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. >> >> [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 > > Small question: What's the equivalent option to disable the same warning for gcc? (slightly related to https://bugs.openjdk.org/browse/JDK-8288293) @TheShermanTanker It's `-Wconversion`. You will have to disable the warning for non-hotspot binaries to get the equivalent behavior. See this commit: https://github.com/openjdk/jdk/commit/44bebd84c041d62f374bfb6f61685d86e5e41518 ------------- PR: https://git.openjdk.org/jdk/pull/9516 From fyang at openjdk.org Mon Jul 18 12:49:46 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 18 Jul 2022 12:49:46 GMT Subject: RFR: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: References: <9u8Xm0dpTJPt_1-RHkVe4F6m6ca2jWuGdxn-PXfVZ5Y=.84109ac5-fb77-47c9-a706-6422242052eb@github.com> <1UN6Yhfh84XWZ-8z8QXIxc0gk5sdgdFfQH4iuVibWDw=.86c93248-1c8a-4b6c-aba7-91432fd9bb72@github.com> Message-ID: On Mon, 18 Jul 2022 11:43:03 GMT, Aleksey Shipilev wrote: >> @shipilev : Yes, the type of _generic_arraycopystub_cnt is int and it occupies 32-bit in memory on Linux/RISC-V. That's why we use incrementw here which increments a 32-bit memory operand. Note that incrementl works for 64-bit memory operand. Hope that explains. Thanks. > > Is this RISC-V specific postfix naming? On x86, there is `b`/`w`/`l`/`q` for 1/2/4/8-byte ops, respectively. This is just my curiosity, it does not block the integration. Yes, default is for 8-byte ops and we use 'w' post-fix for 4-byte ops on RISC-V. And I think this is hangover from the aarch64 port [1]. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L1917 ------------- PR: https://git.openjdk.org/jdk/pull/9461 From adinn at openjdk.org Mon Jul 18 12:57:13 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 18 Jul 2022 12:57:13 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v9] In-Reply-To: References: Message-ID: <71ywCGYrdF23n8wFjgNngQwaao2ysnLjAO8TeElqpSo=.9d2cec63-22b4-44ed-9626-3b9b52366656@github.com> On Mon, 18 Jul 2022 10:33:33 GMT, Andrew Haley wrote: >> Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: >> >> - 8289743: AArch64: Clean up patching logic >> - 8289743: AArch64: Clean up patching logic >> - 8289743: AArch64: Clean up patching logic > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 461: > >> 459: uint32_t insn = ((uint32_t*)insn_addr)[0]; >> 460: int *insn3_addr = &((int*)insn_addr)[2]; >> 461: uint32_t insn3 = (uint32_t)SafeFetch32(insn3_addr, -1); > > I'm wondering if this is safe. Maybe something like `adrp;movk` could be followed by not-an-instruction which looked like an offset. However, I think that's impossible because anything following would be executed immediately following the `movk`. For the same reason, the `adrp;movk` can't be at the very end of an executable page, so I suppose the use of `SafeFetch32` is unnecessary too. > It's still a code smell, though, even if it is safe. Yes, I agree that the word succeeding the movk has to be an instruction when this is code generated by the JIT. That implies as a clear corollary that the `adrp;movk` sequence can't be at the very end of an executable page. So, the use of SafeFetch32 is unnecessary. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From fyang at openjdk.org Mon Jul 18 13:06:01 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 18 Jul 2022 13:06:01 GMT Subject: Integrated: JDK-8290137: riscv: small refactoring for add_memory_int32/64 In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 08:14:05 GMT, Fei Yang wrote: > Currently, add_memory_int32/64 for riscv can only add a sign-extended 12-bit immediate to memory since they call addi/addiw assembler direcly. This constraint could be relaxed when the given memory address is in the expected form: base register plus a sign-extended 12-bit offset. In this case, we can emit code for load + add/sub + store sequence adding arbitrary immediate to memory with no more than two scratch registers (t0 and t1) available. > > We could also refactor these two functions into four seperate functions: increment, incrementw, decrement and decrementw, so that it will be more clear in code logic at the call sites. > > Testing: tier1 tested on riscv64-linux unmatched board. This pull request has now been integrated. Changeset: 92067e20 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/92067e200346c41c2f43763edc01c97c7da1a9e6 Stats: 87 lines in 8 files changed: 40 ins; 0 del; 47 mod 8290137: riscv: small refactoring for add_memory_int32/64 Reviewed-by: yadongwang, fjiang, shade ------------- PR: https://git.openjdk.org/jdk/pull/9461 From aph at openjdk.org Mon Jul 18 13:36:46 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 18 Jul 2022 13:36:46 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: Message-ID: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8289743: AArch64: Clean up patching logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9398/files - new: https://git.openjdk.org/jdk/pull/9398/files/7b647d17..64ced790 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9398&range=11-12 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9398.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9398/head:pull/9398 PR: https://git.openjdk.org/jdk/pull/9398 From mbaesken at openjdk.org Mon Jul 18 15:07:07 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 18 Jul 2022 15:07:07 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v3] In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 17:10:37 GMT, Erik Gahlin wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> Bring back JitRestart event, add codeCacheMaxCapacity > > src/hotspot/share/code/codeCache.cpp line 1369: > >> 1367: event.set_unallocatedCapacity(heap->unallocated_capacity()); >> 1368: event.set_fullCount(heap->full_count()); >> 1369: event.set_codeCacheMaxCapacity(CodeCache::max_capacity()); > > Add test of field in TestCodeCacheFull I added an assertion in [test/jdk/jdk/jfr/event/compiler/TestCodeCacheFull.java](https://github.com/openjdk/jdk/pull/9334/files#diff-345e4e5d768e63111b1ba2bf07f63b9835950742daca683d74583014164ff632) ------------- PR: https://git.openjdk.org/jdk/pull/9334 From coleenp at openjdk.org Mon Jul 18 15:10:47 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 18 Jul 2022 15:10:47 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v3] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 15:09:07 GMT, Coleen Phillimore wrote: >> I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > second get_user_name_slow call should CHECK_NULL too. I actually haven't changed the behavior because instead of "bad class index" in the case of truncation, and the old behavior would throw the truncation exception. It is more correct. ------------- PR: https://git.openjdk.org/jdk/pull/9492 From lucy at openjdk.org Mon Jul 18 15:18:41 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 18 Jul 2022 15:18:41 GMT Subject: RFR: JDK-8289524: Add JFR JIT restart event [v8] In-Reply-To: References: Message-ID: <6Vk17zWFvC2vlXF20Q1rq2y3ZWuCWXyQHnl7DMgkYKE=.35e89a04-a682-41d7-be56-b156dfa9cceb@github.com> On Mon, 18 Jul 2022 08:15:29 GMT, Matthias Baesken wrote: >> The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > BlobType got a new package too With all the fine-tuning, changes look even better now. And there is a test now so we get alerted if anything breaks. Thanks! ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/9334 From jjg at openjdk.org Mon Jul 18 15:24:06 2022 From: jjg at openjdk.org (Jonathan Gibbons) Date: Mon, 18 Jul 2022 15:24:06 GMT Subject: [jdk19] RFR: 8278274: Update nroff pages in JDK 19 before RC In-Reply-To: References: Message-ID: <75mqpwhSuTfYsLguWkG0izRelatkAX7wOwsjC3crYFI=.2449d017-855e-4da9-b1b3-2fcee65cba2c@github.com> On Sun, 17 Jul 2022 22:44:02 GMT, David Holmes wrote: > Please review these changes to the nroff manpage files so that they match their markdown sources that Oracle maintains. > > All pages at a minimum have 19-ea replaced with 19, and copyright set to 2022 if needed. Additionally: > > The Java manpage was missing updates from: > - [JDK-8282018](https://bugs.openjdk.org/browse/JDK-8282018): Add captions to tables on java man page. > > The Java manpage has slight formatting differences from: > - [JDK-8262004](https://bugs.openjdk.org/browse/JDK-8262004): Classpath separator: Man page says semicolon; should be colon on Linux > - [JDK-8236569](https://bugs.openjdk.org/browse/JDK-8236569): -Xss not multiple of 4K does not work for the main thread on macOS > > The Java manpage has a typo fixed in mainline by [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) (for JDK 20) > > > The keytool manpage was missing updates from: > - [JDK-8282014](https://bugs.openjdk.org/browse/JDK-8282014): Add captions to tables on keytool man page. > - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA > > The jar manpage was missing updates from: > - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) > > The jarsigner manpage was missing updates from: > - [JDK-8282015](https://bugs.openjdk.org/browse/JDK-8282015): Add captions to tables on jarsigner man page. > - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA > > The javadoc manpage was missing updates from: > - [JDK-8279034](https://bugs.openjdk.org/browse/JDK-8279034): Update man page for javadoc `--date` option > > The jmod manpage was missing updates from: > - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) > > The jpackage manpage was missing updates from: > - [JDK-8285146](https://bugs.openjdk.org/browse/JDK-8285146): Document jpackage resource dir feature > - [JDK-8284695](https://bugs.openjdk.org/browse/JDK-8284695): Update jpackage man pages for JDK 19 > - [JDK-8284209](https://bugs.openjdk.org/browse/JDK-8284209): Replace remaining usages of 'a the' in source code > > The jshell manpage was missing updates from: > - [JDK-8282016](https://bugs.openjdk.org/browse/JDK-8282016): Add captions to tables on jshell man page. src/java.base/share/man/keytool.1 line 456: > 454: \f[CB]PrivateKeyEntry\f[R] for the signer that already exists in the > 455: keystore. > 456: This option is used to sign the certificate with the signer?s private Not a problem with this PR as such, but we still have a `?` character in the output. ------------- PR: https://git.openjdk.org/jdk19/pull/145 From jjg at openjdk.org Mon Jul 18 15:30:09 2022 From: jjg at openjdk.org (Jonathan Gibbons) Date: Mon, 18 Jul 2022 15:30:09 GMT Subject: [jdk19] RFR: 8278274: Update nroff pages in JDK 19 before RC In-Reply-To: References: Message-ID: On Sun, 17 Jul 2022 22:44:02 GMT, David Holmes wrote: > Please review these changes to the nroff manpage files so that they match their markdown sources that Oracle maintains. > > All pages at a minimum have 19-ea replaced with 19, and copyright set to 2022 if needed. Additionally: > > The Java manpage was missing updates from: > - [JDK-8282018](https://bugs.openjdk.org/browse/JDK-8282018): Add captions to tables on java man page. > > The Java manpage has slight formatting differences from: > - [JDK-8262004](https://bugs.openjdk.org/browse/JDK-8262004): Classpath separator: Man page says semicolon; should be colon on Linux > - [JDK-8236569](https://bugs.openjdk.org/browse/JDK-8236569): -Xss not multiple of 4K does not work for the main thread on macOS > > The Java manpage has a typo fixed in mainline by [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) (for JDK 20) > > > The keytool manpage was missing updates from: > - [JDK-8282014](https://bugs.openjdk.org/browse/JDK-8282014): Add captions to tables on keytool man page. > - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA > > The jar manpage was missing updates from: > - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) > > The jarsigner manpage was missing updates from: > - [JDK-8282015](https://bugs.openjdk.org/browse/JDK-8282015): Add captions to tables on jarsigner man page. > - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA > > The javadoc manpage was missing updates from: > - [JDK-8279034](https://bugs.openjdk.org/browse/JDK-8279034): Update man page for javadoc `--date` option > > The jmod manpage was missing updates from: > - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) > > The jpackage manpage was missing updates from: > - [JDK-8285146](https://bugs.openjdk.org/browse/JDK-8285146): Document jpackage resource dir feature > - [JDK-8284695](https://bugs.openjdk.org/browse/JDK-8284695): Update jpackage man pages for JDK 19 > - [JDK-8284209](https://bugs.openjdk.org/browse/JDK-8284209): Replace remaining usages of 'a the' in source code > > The jshell manpage was missing updates from: > - [JDK-8282016](https://bugs.openjdk.org/browse/JDK-8282016): Add captions to tables on jshell man page. The version changes in each file look good (`19-ea` to `19`). The changes for javadoc look good. I looked over the other changes for other files, and while they look good, I cannot speak for their technical accuracy. That being said, this is an automated process deriving info from upstream, so is likely OK. ------------- Marked as reviewed by jjg (Reviewer). PR: https://git.openjdk.org/jdk19/pull/145 From aph at openjdk.org Mon Jul 18 16:13:01 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 18 Jul 2022 16:13:01 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Mon, 18 Jul 2022 13:36:46 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic Clean tier1 linux-aarch64-server-fastdebug, macosx-aarch64-server-fastdebug. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From adinn at openjdk.org Mon Jul 18 16:37:08 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 18 Jul 2022 16:37:08 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Mon, 18 Jul 2022 13:36:46 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic Looks ok to me. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9398 From lmesnik at openjdk.org Mon Jul 18 18:16:24 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 18 Jul 2022 18:16:24 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v9] In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9505/files - new: https://git.openjdk.org/jdk/pull/9505/files/628ff010..87065cc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9505&range=07-08 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9505.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9505/head:pull/9505 PR: https://git.openjdk.org/jdk/pull/9505 From dcubed at openjdk.org Mon Jul 18 20:35:43 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 18 Jul 2022 20:35:43 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v9] In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: <8F85NTdqWitO5L0ZXuQy-YY3vKsFthTKxf5wErmiCks=.f960750f-3451-41dc-9df3-6207baad8ee0@github.com> On Mon, 18 Jul 2022 18:16:24 GMT, Leonid Mesnik wrote: >> The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > clean up Thumbs up. Did you decide that it was okay if the mallocWithGC2.java test didn't wait for the cHeapEater thread to finish? ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Mon Jul 18 20:58:32 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 18 Jul 2022 20:58:32 GMT Subject: RFR: 8289612: Change hotspot/jtreg tests to not use Thread.stop [v9] In-Reply-To: References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Mon, 18 Jul 2022 18:16:24 GMT, Leonid Mesnik wrote: >> The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > clean up Yep, the goal of this thread is just to call malloc/free in the cycle. Doesn't need to wait for completion. ------------- PR: https://git.openjdk.org/jdk/pull/9505 From lmesnik at openjdk.org Mon Jul 18 21:56:45 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 18 Jul 2022 21:56:45 GMT Subject: Integrated: 8289612: Change hotspot/jtreg tests to not use Thread.stop In-Reply-To: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> References: <0u2bxw3TOmGWO0nFms-ztAAFvvQDuKFrkAh5VYmP-pg=.4ba4b076-9692-4d52-9464-6d65f80db4f3@github.com> Message-ID: On Fri, 15 Jul 2022 00:11:16 GMT, Leonid Mesnik wrote: > The tests are updated to don't use Thread.stop(). Tests whose intention is to verify async exception updated to use jvmti StopThread. This pull request has now been integrated. Changeset: 5a96a5db Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/5a96a5db13992118ec384207edfb04136f339253 Stats: 164 lines in 7 files changed: 116 ins; 24 del; 24 mod 8289612: Change hotspot/jtreg tests to not use Thread.stop Reviewed-by: dholmes, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/9505 From dholmes at openjdk.org Mon Jul 18 23:26:00 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Jul 2022 23:26:00 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v3] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 15:09:07 GMT, Coleen Phillimore wrote: >> I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > second get_user_name_slow call should CHECK_NULL too. Thumbs up from me now - sorry it took me a while to "get it". Thanks. Ah now I see sorry. new_exception would have ignored the requested exception and thrown the pending truncation exception instead. Now it is much more clear that, that is what happens. src/hotspot/share/utilities/exceptions.cpp line 361: > 359: thread->clear_pending_exception(); > 360: ResourceMark rm(thread); > 361: assert(incoming_exception.is_null(), "Pending exception while throwing %s %s", name->as_C_string(), message); You may as well leave this as `false` ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9492 From dholmes at openjdk.org Mon Jul 18 23:32:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Jul 2022 23:32:04 GMT Subject: [jdk19] RFR: 8278274: Update nroff pages in JDK 19 before RC In-Reply-To: References: Message-ID: On Mon, 18 Jul 2022 15:26:23 GMT, Jonathan Gibbons wrote: >> Please review these changes to the nroff manpage files so that they match their markdown sources that Oracle maintains. >> >> All pages at a minimum have 19-ea replaced with 19, and copyright set to 2022 if needed. Additionally: >> >> The Java manpage was missing updates from: >> - [JDK-8282018](https://bugs.openjdk.org/browse/JDK-8282018): Add captions to tables on java man page. >> >> The Java manpage has slight formatting differences from: >> - [JDK-8262004](https://bugs.openjdk.org/browse/JDK-8262004): Classpath separator: Man page says semicolon; should be colon on Linux >> - [JDK-8236569](https://bugs.openjdk.org/browse/JDK-8236569): -Xss not multiple of 4K does not work for the main thread on macOS >> >> The Java manpage has a typo fixed in mainline by [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) (for JDK 20) >> >> >> The keytool manpage was missing updates from: >> - [JDK-8282014](https://bugs.openjdk.org/browse/JDK-8282014): Add captions to tables on keytool man page. >> - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA >> >> The jar manpage was missing updates from: >> - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) >> >> The jarsigner manpage was missing updates from: >> - [JDK-8282015](https://bugs.openjdk.org/browse/JDK-8282015): Add captions to tables on jarsigner man page. >> - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA >> >> The javadoc manpage was missing updates from: >> - [JDK-8279034](https://bugs.openjdk.org/browse/JDK-8279034): Update man page for javadoc `--date` option >> >> The jmod manpage was missing updates from: >> - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) >> >> The jpackage manpage was missing updates from: >> - [JDK-8285146](https://bugs.openjdk.org/browse/JDK-8285146): Document jpackage resource dir feature >> - [JDK-8284695](https://bugs.openjdk.org/browse/JDK-8284695): Update jpackage man pages for JDK 19 >> - [JDK-8284209](https://bugs.openjdk.org/browse/JDK-8284209): Replace remaining usages of 'a the' in source code >> >> The jshell manpage was missing updates from: >> - [JDK-8282016](https://bugs.openjdk.org/browse/JDK-8282016): Add captions to tables on jshell man page. > > The version changes in each file look good (`19-ea` to `19`). > The changes for javadoc look good. > > I looked over the other changes for other files, and while they look good, I cannot speak for their technical accuracy. That being said, this is an automated process deriving info from upstream, so is likely OK. Thanks for the review @jonathan-gibbons ! I'll wait a day in case there are any further comments. > src/java.base/share/man/keytool.1 line 456: > >> 454: \f[CB]PrivateKeyEntry\f[R] for the signer that already exists in the >> 455: keystore. >> 456: This option is used to sign the certificate with the signer?s private > > Not a problem with this PR as such, but we still have a `?` character in the output. Yeah I spotted that too, but it would need to be fixed in source and nroff. Must be some kind of "smart quote" from an editor. Do you think this needs to be fixed or just handle it in mainline? ------------- PR: https://git.openjdk.org/jdk19/pull/145 From shade at openjdk.org Tue Jul 19 05:56:26 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Jul 2022 05:56:26 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions Message-ID: See the rationale in the bug. The test time improves: # Before real 1m27.397s user 2m39.937s sys 0m5.966s # After real 1m13.443s ; -16% user 2m24.238s ; -10% sys 0m5.885s ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/9548/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9548&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290495 Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9548.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9548/head:pull/9548 PR: https://git.openjdk.org/jdk/pull/9548 From dholmes at openjdk.org Tue Jul 19 06:29:53 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 19 Jul 2022 06:29:53 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions In-Reply-To: References: Message-ID: <5qPJMu3i6qce8KkYKvzgY6GCW2ESKTnYA73hNzmL45Q=.517c2c2e-b87c-4fc7-8e4e-2d7668a29875@github.com> On Tue, 19 Jul 2022 05:47:59 GMT, Aleksey Shipilev wrote: > See the rationale in the bug. > > The test time improves: > > > # Before > real 1m27.397s > user 2m39.937s > sys 0m5.966s > > # After > real 1m13.443s ; -16% > user 2m24.238s ; -10% > sys 0m5.885s As a general rule we don't care about the performance of non-product code. If you start down this path there is a huge amount of potential optimisation that might be done. Why is this one worthwhile? Cheers. ------------- PR: https://git.openjdk.org/jdk/pull/9548 From shade at openjdk.org Tue Jul 19 06:32:05 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Jul 2022 06:32:05 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions In-Reply-To: <5qPJMu3i6qce8KkYKvzgY6GCW2ESKTnYA73hNzmL45Q=.517c2c2e-b87c-4fc7-8e4e-2d7668a29875@github.com> References: <5qPJMu3i6qce8KkYKvzgY6GCW2ESKTnYA73hNzmL45Q=.517c2c2e-b87c-4fc7-8e4e-2d7668a29875@github.com> Message-ID: On Tue, 19 Jul 2022 06:26:34 GMT, David Holmes wrote: > As a general rule we don't care about the performance of non-product code. If you start down this path there is a huge amount of potential optimisation that might be done. Why is this one worthwhile? I run lots of testing, so I am interested in simple stuff that optimizes test times. The more generic solution, #9543, yields substantial reducing in test times, but Ioi wanted this one too. ------------- PR: https://git.openjdk.org/jdk/pull/9548 From mbaesken at openjdk.org Tue Jul 19 07:12:06 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 19 Jul 2022 07:12:06 GMT Subject: Integrated: JDK-8289524: Add JFR JIT restart event In-Reply-To: References: Message-ID: <0fKBxAbjozcML8dW_5OP609hXt52bh-x6ezRUUHohq0=.0c23378c-527f-4f18-84ed-2281d64a4900@github.com> On Thu, 30 Jun 2022 13:17:09 GMT, Matthias Baesken wrote: > The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart. This pull request has now been integrated. Changeset: dfbc6919 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/dfbc6919e1e233b42aede97f1323ce5529fab7cf Stats: 134 lines in 8 files changed: 129 ins; 5 del; 0 mod 8289524: Add JFR JIT restart event Reviewed-by: kvn, lucy ------------- PR: https://git.openjdk.org/jdk/pull/9334 From dholmes at openjdk.org Tue Jul 19 07:48:05 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 19 Jul 2022 07:48:05 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions In-Reply-To: References: Message-ID: On Tue, 19 Jul 2022 05:47:59 GMT, Aleksey Shipilev wrote: > See the rationale in the bug. > > The test time improves: > > > # Before > real 1m27.397s > user 2m39.937s > sys 0m5.966s > > # After > real 1m13.443s ; -16% > user 2m24.238s ; -10% > sys 0m5.885s I'll approve just to avoid spending more cycles on this but I'd hate to see this become common place. :) src/hotspot/share/oops/method.cpp line 811: > 809: if (class_access_flags.is_interface() && (is_nonv != is_static()) && (is_nonv != is_private())) { > 810: ResourceMark rm; > 811: fatal("nonvirtual unexpected for non-static, non-private: %s", You could just move the RM and leave the assert as is. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9548 From shade at openjdk.org Tue Jul 19 07:53:57 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Jul 2022 07:53:57 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions [v2] In-Reply-To: References: Message-ID: > See the rationale in the bug. > > The test time improves: > > > # Before > real 1m27.397s > user 2m39.937s > sys 0m5.966s > > # After > real 1m13.443s ; -16% > user 2m24.238s ; -10% > sys 0m5.885s Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Just move the ResourceMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9548/files - new: https://git.openjdk.org/jdk/pull/9548/files/96d1e023..7a1b0d85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9548&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9548&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9548.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9548/head:pull/9548 PR: https://git.openjdk.org/jdk/pull/9548 From shade at openjdk.org Tue Jul 19 07:53:59 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Jul 2022 07:53:59 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions [v2] In-Reply-To: References: Message-ID: On Tue, 19 Jul 2022 07:44:36 GMT, David Holmes wrote: > I'll approve just to avoid spending more cycles on this but I'd hate to see this become common place. :) True, I am on the fence about fiddling about every `ResourceMark` use. That's why I did the more generic thing in that other PR, so we can slap `ResourceMark`-s pretty much wherever without obsessing about these details. :) > src/hotspot/share/oops/method.cpp line 811: > >> 809: if (class_access_flags.is_interface() && (is_nonv != is_static()) && (is_nonv != is_private())) { >> 810: ResourceMark rm; >> 811: fatal("nonvirtual unexpected for non-static, non-private: %s", > > You could just move the RM and leave the assert as is. Ok, we can do that. ------------- PR: https://git.openjdk.org/jdk/pull/9548 From stuefe at openjdk.org Tue Jul 19 08:04:05 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 19 Jul 2022 08:04:05 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions [v2] In-Reply-To: References: Message-ID: On Tue, 19 Jul 2022 07:53:57 GMT, Aleksey Shipilev wrote: >> See the rationale in the bug. >> >> The test time improves: >> >> >> # Before >> real 1m27.397s >> user 2m39.937s >> sys 0m5.966s >> >> # After >> real 1m13.443s ; -16% >> user 2m24.238s ; -10% >> sys 0m5.885s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Just move the ResourceMark Makes sense. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/9548 From ngasson at openjdk.org Tue Jul 19 08:54:50 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Tue, 19 Jul 2022 08:54:50 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Mon, 18 Jul 2022 13:36:46 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic I ran tier1-3 without issue. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Tue Jul 19 08:54:50 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Jul 2022 08:54:50 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v9] In-Reply-To: <71ywCGYrdF23n8wFjgNngQwaao2ysnLjAO8TeElqpSo=.9d2cec63-22b4-44ed-9626-3b9b52366656@github.com> References: <71ywCGYrdF23n8wFjgNngQwaao2ysnLjAO8TeElqpSo=.9d2cec63-22b4-44ed-9626-3b9b52366656@github.com> Message-ID: On Mon, 18 Jul 2022 12:54:57 GMT, Andrew Dinn wrote: > Yes, I agree that the word succeeding the movk has to be an instruction when this is code generated by the JIT. Mmm, but there are non-JIT uses of ADRP. Nonetheless, I think we're OK. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Tue Jul 19 08:59:48 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Jul 2022 08:59:48 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Tue, 19 Jul 2022 08:50:00 GMT, Nick Gasson wrote: > I ran tier1-3 without issue. Great, thanks for doing that. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Tue Jul 19 10:13:59 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Jul 2022 10:13:59 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 07:03:58 GMT, Andrew Haley wrote: > To be on the safe side I'm putting this through our internal testing. Please hold off integrating until I give it the green light. Thanks. Would you like to do this again, please? It's been largely rewritten after review feedback. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From dholmes at openjdk.org Tue Jul 19 12:13:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 19 Jul 2022 12:13:04 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions [v2] In-Reply-To: References: Message-ID: On Tue, 19 Jul 2022 07:53:57 GMT, Aleksey Shipilev wrote: >> See the rationale in the bug. >> >> The test time improves: >> >> >> # Before >> real 1m27.397s >> user 2m39.937s >> sys 0m5.966s >> >> # After >> real 1m13.443s ; -16% >> user 2m24.238s ; -10% >> sys 0m5.885s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Just move the ResourceMark Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9548 From eosterlund at openjdk.org Tue Jul 19 14:44:31 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 19 Jul 2022 14:44:31 GMT Subject: RFR: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 Message-ID: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> The MacroAssembler::verified_entry method is a C2 function. It belongs in C2_MacroAssembler. This patch moves it there on x86. ------------- Commit messages: - 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 Changes: https://git.openjdk.org/jdk/pull/9558/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9558&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290534 Stats: 195 lines in 5 files changed: 102 ins; 92 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9558/head:pull/9558 PR: https://git.openjdk.org/jdk/pull/9558 From coleenp at openjdk.org Tue Jul 19 14:44:52 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 19 Jul 2022 14:44:52 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v3] In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 15:09:07 GMT, Coleen Phillimore wrote: >> I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > second get_user_name_slow call should CHECK_NULL too. Thanks David for the review and for battling the confusion this code provoked. Hopefully in the end this helps. ------------- PR: https://git.openjdk.org/jdk/pull/9492 From coleenp at openjdk.org Tue Jul 19 14:44:55 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 19 Jul 2022 14:44:55 GMT Subject: RFR: 8272096: Exceptions::new_exception can return wrong exception [v3] In-Reply-To: References: Message-ID: On Mon, 18 Jul 2022 05:15:42 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> second get_user_name_slow call should CHECK_NULL too. > > src/hotspot/share/utilities/exceptions.cpp line 361: > >> 359: thread->clear_pending_exception(); >> 360: ResourceMark rm(thread); >> 361: assert(incoming_exception.is_null(), "Pending exception while throwing %s %s", name->as_C_string(), message); > > You may as well leave this as `false` Having the condition there seems nicer when you get the assert than false when I was testing it. I wish there was a version of fatal that wasn't product mode. ------------- PR: https://git.openjdk.org/jdk/pull/9492 From coleenp at openjdk.org Tue Jul 19 14:44:57 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 19 Jul 2022 14:44:57 GMT Subject: Integrated: 8272096: Exceptions::new_exception can return wrong exception In-Reply-To: References: Message-ID: On Thu, 14 Jul 2022 12:49:34 GMT, Coleen Phillimore wrote: > I added an assert if Exceptions::new_exception is called with a pending exception and fixed the places where it is called with a pending exception. That leaves only two possible exceptions. I left the product mode code in to return the pending exception if allocating the exception message doesn't thrown OOM because it was always there and seems dubious. Tested with jck tests and tier1-7. This pull request has now been integrated. Changeset: bbc57483 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/bbc57483ce4904efa5ff4c8384a74ee7f8776317 Stats: 29 lines in 4 files changed: 6 ins; 5 del; 18 mod 8272096: Exceptions::new_exception can return wrong exception Reviewed-by: hseigel, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/9492 From eosterlund at openjdk.org Tue Jul 19 14:53:12 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 19 Jul 2022 14:53:12 GMT Subject: RFR: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 [v2] In-Reply-To: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> References: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> Message-ID: > The MacroAssembler::verified_entry method is a C2 function. It belongs in C2_MacroAssembler. This patch moves it there on x86. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Fix 32 bit build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9558/files - new: https://git.openjdk.org/jdk/pull/9558/files/a84d148d..14452a9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9558&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9558&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9558/head:pull/9558 PR: https://git.openjdk.org/jdk/pull/9558 From shade at openjdk.org Tue Jul 19 14:53:13 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Jul 2022 14:53:13 GMT Subject: RFR: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 [v2] In-Reply-To: References: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> Message-ID: <2Zahm1fY_9sh2a2NipGB9fnUL0PfPh2dSenqCP9ablo=.2e127c97-a987-4aa5-ba4a-89ab854a45ae@github.com> On Tue, 19 Jul 2022 14:50:22 GMT, Erik ?sterlund wrote: >> The MacroAssembler::verified_entry method is a C2 function. It belongs in C2_MacroAssembler. This patch moves it there on x86. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32 bit build Looks fine. I have not verified the move was line-to-line exact, but I assume that was a mechanical copy-paste. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/9558 From eosterlund at openjdk.org Tue Jul 19 14:59:04 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 19 Jul 2022 14:59:04 GMT Subject: RFR: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 [v2] In-Reply-To: <2Zahm1fY_9sh2a2NipGB9fnUL0PfPh2dSenqCP9ablo=.2e127c97-a987-4aa5-ba4a-89ab854a45ae@github.com> References: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> <2Zahm1fY_9sh2a2NipGB9fnUL0PfPh2dSenqCP9ablo=.2e127c97-a987-4aa5-ba4a-89ab854a45ae@github.com> Message-ID: On Tue, 19 Jul 2022 14:48:10 GMT, Aleksey Shipilev wrote: > Looks fine. > > I have not verified the move was line-to-line exact, but I assume that was a mechanical copy-paste. Thanks @shipilev. It was indeed a mechanical copy-paste. Does that make it... trivial? :o ------------- PR: https://git.openjdk.org/jdk/pull/9558 From shade at openjdk.org Tue Jul 19 14:59:06 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Jul 2022 14:59:06 GMT Subject: RFR: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 [v2] In-Reply-To: <2Zahm1fY_9sh2a2NipGB9fnUL0PfPh2dSenqCP9ablo=.2e127c97-a987-4aa5-ba4a-89ab854a45ae@github.com> References: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> <2Zahm1fY_9sh2a2NipGB9fnUL0PfPh2dSenqCP9ablo=.2e127c97-a987-4aa5-ba4a-89ab854a45ae@github.com> Message-ID: On Tue, 19 Jul 2022 14:48:10 GMT, Aleksey Shipilev wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix 32 bit build > > Looks fine. > > I have not verified the move was line-to-line exact, but I assume that was a mechanical copy-paste. > Thanks @shipilev. It was indeed a mechanical copy-paste. Does that make it... trivial? :o Let's see what GHA says about *that*! Declarations/definitions move might yield surprise build failures. ------------- PR: https://git.openjdk.org/jdk/pull/9558 From pchilanomate at openjdk.org Tue Jul 19 15:24:05 2022 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 19 Jul 2022 15:24:05 GMT Subject: RFR: 8227060: Optimize safepoint cleanup subtask order [v3] In-Reply-To: References: Message-ID: On Sat, 16 Jul 2022 15:22:56 GMT, Coleen Phillimore wrote: >> Most of the analysis in the CR is for code that's removed, but I found one safepoint cleanup task that's unused. Also the dictionary resizing and symbol/string table rehashing, while rare, could take a long time so I moved them sooner in the list. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > More Kim improvements. Looks good to me. ------------- Marked as reviewed by pchilanomate (Reviewer). PR: https://git.openjdk.org/jdk/pull/9515 From kvn at openjdk.org Tue Jul 19 15:47:11 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 19 Jul 2022 15:47:11 GMT Subject: RFR: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 [v2] In-Reply-To: References: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> Message-ID: <7YJGXR-uvUjUzlHTw9y1MLgukN89UIAYTc62BCrrVSw=.78b04ec1-e445-479c-bee6-b3ac8c5871ca@github.com> On Tue, 19 Jul 2022 14:53:12 GMT, Erik ?sterlund wrote: >> The MacroAssembler::verified_entry method is a C2 function. It belongs in C2_MacroAssembler. This patch moves it there on x86. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32 bit build Changes are good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9558 From eosterlund at openjdk.org Tue Jul 19 15:57:10 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 19 Jul 2022 15:57:10 GMT Subject: RFR: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 [v2] In-Reply-To: <7YJGXR-uvUjUzlHTw9y1MLgukN89UIAYTc62BCrrVSw=.78b04ec1-e445-479c-bee6-b3ac8c5871ca@github.com> References: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> <7YJGXR-uvUjUzlHTw9y1MLgukN89UIAYTc62BCrrVSw=.78b04ec1-e445-479c-bee6-b3ac8c5871ca@github.com> Message-ID: On Tue, 19 Jul 2022 15:43:03 GMT, Vladimir Kozlov wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix 32 bit build > > Changes are good. Thanks for the review, @vnkozlov ! ------------- PR: https://git.openjdk.org/jdk/pull/9558 From coleenp at openjdk.org Tue Jul 19 16:32:04 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 19 Jul 2022 16:32:04 GMT Subject: RFR: 8227060: Optimize safepoint cleanup subtask order [v3] In-Reply-To: References: Message-ID: On Sat, 16 Jul 2022 15:22:56 GMT, Coleen Phillimore wrote: >> Most of the analysis in the CR is for code that's removed, but I found one safepoint cleanup task that's unused. Also the dictionary resizing and symbol/string table rehashing, while rare, could take a long time so I moved them sooner in the list. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > More Kim improvements. Thank you, Kim and Patricio! ------------- PR: https://git.openjdk.org/jdk/pull/9515 From coleenp at openjdk.org Tue Jul 19 16:35:07 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 19 Jul 2022 16:35:07 GMT Subject: Integrated: 8227060: Optimize safepoint cleanup subtask order In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 12:36:27 GMT, Coleen Phillimore wrote: > Most of the analysis in the CR is for code that's removed, but I found one safepoint cleanup task that's unused. Also the dictionary resizing and symbol/string table rehashing, while rare, could take a long time so I moved them sooner in the list. > Tested with tier1-3. This pull request has now been integrated. Changeset: 96a542fe Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/96a542feb2064dba155ebf05311752995d164038 Stats: 60 lines in 5 files changed: 20 ins; 34 del; 6 mod 8227060: Optimize safepoint cleanup subtask order Reviewed-by: kbarrett, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/9515 From jwilhelm at openjdk.org Tue Jul 19 16:51:55 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Tue, 19 Jul 2022 16:51:55 GMT Subject: RFR: Merge jdk19 Message-ID: Forwardport JDK 19 -> JDK 20 ------------- Commit messages: - Merge remote-tracking branch 'jdk19/master' into Merge_jdk19 - 8290524: Typo in javadoc of MemorySegment/MemoryAddress - 8288482: JFR: Cannot resolve method - 8290417: CDS cannot archive lamda proxy with useImplMethodHandle The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=jdk&pr=9561&range=00.0 - jdk19: https://webrevs.openjdk.org/?repo=jdk&pr=9561&range=00.1 Changes: https://git.openjdk.org/jdk/pull/9561/files Stats: 510 lines in 10 files changed: 437 ins; 0 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/9561.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9561/head:pull/9561 PR: https://git.openjdk.org/jdk/pull/9561 From duke at openjdk.org Tue Jul 19 17:30:42 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Tue, 19 Jul 2022 17:30:42 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines [v3] In-Reply-To: References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: On Sat, 9 Jul 2022 03:06:30 GMT, Yi-Fan Tsai wrote: >> A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. >> >> Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Use a hash table to deduplicate Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 29: > 27: #include "asm/macroAssembler.hpp" > 28: > 29: void CodeBuffer::share_trampoline_for(address dest, int caller_offset) { Shouldn't it be `shared_trampoline_for` to be consistent with `shared_stub_to_interp_for`? src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 38: > 36: } > 37: > 38: static bool emit_shared_stubs_to_runtime_call(CodeBuffer* cb, CodeBuffer::SharedTrampolineRequests* requests) { emit_shared_trampoline src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 56: > 54: prev.at(i) = last.at(j); > 55: last.at(j) = i; > 56: } Why not to implement `SharedTrampolineRequests` as a simple hash table? It can be based on: `GrowableArray>>` Is it writing simple hash table code better than using sorting? If a number of trampolines is small there should not be much performance difference. IMHO, less code is better. How many trampolines per method are created? test/hotspot/jtreg/compiler/sharedstubs/SharedStubToRuntimeTest.java line 46: > 44: import jdk.test.lib.process.ProcessTools; > 45: > 46: public class SharedStubToRuntimeTest { SharedTrampolineTest ------------- PR: https://git.openjdk.org/jdk/pull/9405 From dlong at openjdk.org Tue Jul 19 20:16:51 2022 From: dlong at openjdk.org (Dean Long) Date: Tue, 19 Jul 2022 20:16:51 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows In-Reply-To: References: Message-ID: On Fri, 15 Jul 2022 13:25:57 GMT, Jorn Vernee wrote: > This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. > > Out of the ~1100 files that make up hotspot on Windows x64 , ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. > > Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. > > I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files, so I wanted to gather feedback on this approach before continuing with that. > > --- > > To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: > > PRAGMA_DIAG_PUSH > PRAGMA_ALLOW_LOSSY_CONVERSIONS > > And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: > > PRAGMA_DIAG_POP > > 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. > > [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 I thought PRAGMA_DIAG_PUSH/POP are for narrowing the scope of the pragma. So unless we are concatenating .cpp files together, I would think they are not needed for .cpp files if we want to affect the whole file. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From lmesnik at openjdk.org Tue Jul 19 20:53:19 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 19 Jul 2022 20:53:19 GMT Subject: RFR: 8290468: Remove gc/gctests/mallocWithGC tests Message-ID: The gc/gctests/mallocWithGC tests test interaction of GC with malloc/free. They are duplicates of vmTestbase/gc/lock/malloc/ tests. Not sure that even vmTestbase/gc/lock/malloc/ are needed. GC do nothing with malloc/free and it doesn't make sense to test their interaction. ------------- Commit messages: - 8290468: Remove gc/gctests/mallocWithGC tests Changes: https://git.openjdk.org/jdk/pull/9563/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9563&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290468 Stats: 540 lines in 7 files changed: 0 ins; 540 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9563.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9563/head:pull/9563 PR: https://git.openjdk.org/jdk/pull/9563 From jvernee at openjdk.org Tue Jul 19 20:55:38 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 19 Jul 2022 20:55:38 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows In-Reply-To: References: Message-ID: On Tue, 19 Jul 2022 20:13:35 GMT, Dean Long wrote: >> This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. >> >> Out of the ~1100 files that make up hotspot on Windows x64 , ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. >> >> Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. >> >> I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files, so I wanted to gather feedback on this approach before continuing with that. >> >> --- >> >> To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: >> >> PRAGMA_DIAG_PUSH >> PRAGMA_ALLOW_LOSSY_CONVERSIONS >> >> And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: >> >> PRAGMA_DIAG_POP >> >> 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. >> >> [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 > > I thought PRAGMA_DIAG_PUSH/POP are for narrowing the scope of the pragma. So unless we are concatenating .cpp files together, I would think they are not needed for .cpp files if we want to affect the whole file. @dean-long That's a good point. The PUSH and POP could be removed for the cpp files. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From duke at openjdk.org Tue Jul 19 22:46:55 2022 From: duke at openjdk.org (Cesar Soares) Date: Tue, 19 Jul 2022 22:46:55 GMT Subject: RFR: 8241503: C2: Share MacroAssembler between mach nodes during code emission [v4] In-Reply-To: <3YuxRKFw0wOTbkB3kvVxbggYB4FRRWoHY3xOaD7xOUc=.9c100fcb-14ba-442d-b221-7851a9d8eb75@github.com> References: <-Q8DJD8lCN4calr3RAAv0vepUN8s_LE00kPPn9GPxNg=.d01d7d7a-c18c-4764-ae75-b15306bc7b3f@github.com> <3YuxRKFw0wOTbkB3kvVxbggYB4FRRWoHY3xOaD7xOUc=.9c100fcb-14ba-442d-b221-7851a9d8eb75@github.com> Message-ID: On Tue, 14 Jun 2022 03:56:47 GMT, Cesar Soares wrote: >> Hi there, can I please get some reviews on this change? The patch is to make the code reuse the same C2_MacroAssembler object during the emission of CPU instructions of a given compilation. >> >> As you'll see the change affects all backends. I've done my best to keep the changes minimal/simple. >> >> I tested this locally on Linux x86_64, x86_32 and MacOS Arm32, and ARM64. >> >> **I need help testing the changes on PPC, S390, and RISCV**. I cross-compiled the JVM locally and the builds are all succeeding, but I couldn't use an emulator (yet) or any real hardware (no access to one) to test the changes on these platforms. I see that GitHub actions do some tests on S390 and PPC but the tests seem to not be extensive. >> >> Thanks in advance, >> Cesar > > Cesar Soares has updated the pull request incrementally with one additional commit since the last revision: > > fix merge I'll resolve the conflicts and continue working on this. Any help testing will be appreciated. ------------- PR: https://git.openjdk.org/jdk/pull/9074 From dholmes at openjdk.org Wed Jul 20 04:44:05 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Jul 2022 04:44:05 GMT Subject: [jdk19] Integrated: 8278274: Update nroff pages in JDK 19 before RC In-Reply-To: References: Message-ID: On Sun, 17 Jul 2022 22:44:02 GMT, David Holmes wrote: > Please review these changes to the nroff manpage files so that they match their markdown sources that Oracle maintains. > > All pages at a minimum have 19-ea replaced with 19, and copyright set to 2022 if needed. Additionally: > > The Java manpage was missing updates from: > - [JDK-8282018](https://bugs.openjdk.org/browse/JDK-8282018): Add captions to tables on java man page. > > The Java manpage has slight formatting differences from: > - [JDK-8262004](https://bugs.openjdk.org/browse/JDK-8262004): Classpath separator: Man page says semicolon; should be colon on Linux > - [JDK-8236569](https://bugs.openjdk.org/browse/JDK-8236569): -Xss not multiple of 4K does not work for the main thread on macOS > > The Java manpage has a typo fixed in mainline by [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) (for JDK 20) > > > The keytool manpage was missing updates from: > - [JDK-8282014](https://bugs.openjdk.org/browse/JDK-8282014): Add captions to tables on keytool man page. > - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA > > The jar manpage was missing updates from: > - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) > > The jarsigner manpage was missing updates from: > - [JDK-8282015](https://bugs.openjdk.org/browse/JDK-8282015): Add captions to tables on jarsigner man page. > - [JDK-8267319](https://bugs.openjdk.org/browse/JDK-8267319): Use larger default key sizes and algorithms based on CNSA > > The javadoc manpage was missing updates from: > - [JDK-8279034](https://bugs.openjdk.org/browse/JDK-8279034): Update man page for javadoc `--date` option > > The jmod manpage was missing updates from: > - [JDK-8278764](https://bugs.openjdk.org/browse/JDK-8278764): jar and jmod man pages need the new --date documenting from CSR [JDK-8277755](https://bugs.openjdk.org/browse/JDK-8277755) > > The jpackage manpage was missing updates from: > - [JDK-8285146](https://bugs.openjdk.org/browse/JDK-8285146): Document jpackage resource dir feature > - [JDK-8284695](https://bugs.openjdk.org/browse/JDK-8284695): Update jpackage man pages for JDK 19 > - [JDK-8284209](https://bugs.openjdk.org/browse/JDK-8284209): Replace remaining usages of 'a the' in source code > > The jshell manpage was missing updates from: > - [JDK-8282016](https://bugs.openjdk.org/browse/JDK-8282016): Add captions to tables on jshell man page. This pull request has now been integrated. Changeset: 618f3a82 Author: David Holmes URL: https://git.openjdk.org/jdk19/commit/618f3a82a4d45cdb66b86259ae60dd1c322b987b Stats: 515 lines in 28 files changed: 431 ins; 16 del; 68 mod 8278274: Update nroff pages in JDK 19 before RC Reviewed-by: jjg ------------- PR: https://git.openjdk.org/jdk19/pull/145 From dholmes at openjdk.org Wed Jul 20 05:37:13 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Jul 2022 05:37:13 GMT Subject: [jdk19] RFR: 8278274: Update nroff pages in JDK 19 before RC In-Reply-To: References: Message-ID: On Mon, 18 Jul 2022 23:27:58 GMT, David Holmes wrote: >> src/java.base/share/man/keytool.1 line 456: >> >>> 454: \f[CB]PrivateKeyEntry\f[R] for the signer that already exists in the >>> 455: keystore. >>> 456: This option is used to sign the certificate with the signer?s private >> >> Not a problem with this PR as such, but we still have a `?` character in the output. > > Yeah I spotted that too, but it would need to be fixed in source and nroff. Must be some kind of "smart quote" from an editor. Do you think this needs to be fixed or just handle it in mainline? Filed [JDK-8290626](https://bugs.openjdk.org/browse/JDK-8290626). It can easily be fixed before RDP2. ------------- PR: https://git.openjdk.org/jdk19/pull/145 From shade at openjdk.org Wed Jul 20 06:05:05 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Jul 2022 06:05:05 GMT Subject: RFR: 8290495: Micro-optimize Method::can_be_statically_bound assertions [v2] In-Reply-To: References: Message-ID: <9NJTaapr15czw-j-Eo0_fG6ozfg-WwCQebQ79l_sMxI=.72910ad2-7626-4bee-8cee-78cd1d3d26d6@github.com> On Tue, 19 Jul 2022 07:53:57 GMT, Aleksey Shipilev wrote: >> See the rationale in the bug. >> >> The test time improves: >> >> >> # Before >> real 1m27.397s >> user 2m39.937s >> sys 0m5.966s >> >> # After >> real 1m13.443s ; -16% >> user 2m24.238s ; -10% >> sys 0m5.885s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Just move the ResourceMark Thanks all! ------------- PR: https://git.openjdk.org/jdk/pull/9548 From shade at openjdk.org Wed Jul 20 06:05:06 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Jul 2022 06:05:06 GMT Subject: Integrated: 8290495: Micro-optimize Method::can_be_statically_bound assertions In-Reply-To: References: Message-ID: On Tue, 19 Jul 2022 05:47:59 GMT, Aleksey Shipilev wrote: > See the rationale in the bug. > > The test time improves: > > > # Before > real 1m27.397s > user 2m39.937s > sys 0m5.966s > > # After > real 1m13.443s ; -16% > user 2m24.238s ; -10% > sys 0m5.885s This pull request has now been integrated. Changeset: 2ea3f546 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/2ea3f546c249cf32df460238da72c9744b3c1eb2 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod 8290495: Micro-optimize Method::can_be_statically_bound assertions Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/9548 From jwilhelm at openjdk.org Wed Jul 20 07:42:49 2022 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Wed, 20 Jul 2022 07:42:49 GMT Subject: Integrated: Merge jdk19 In-Reply-To: References: Message-ID: <55eEOvGRHsbKUu5hZ65TSNxRS6cS1ED5vnoZfxmgkJM=.86583e3d-bad5-471e-8d1b-c8e3283105be@github.com> On Tue, 19 Jul 2022 16:41:54 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 19 -> JDK 20 This pull request has now been integrated. Changeset: a3e07d95 Author: Jesper Wilhelmsson URL: https://git.openjdk.org/jdk/commit/a3e07d950ae752daf779607693c422a4c35924a6 Stats: 510 lines in 10 files changed: 437 ins; 0 del; 73 mod Merge ------------- PR: https://git.openjdk.org/jdk/pull/9561 From dholmes at openjdk.org Wed Jul 20 09:35:11 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 20 Jul 2022 09:35:11 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: <0VBhsZnv-Ixe4WG-L2IiWkWNpl--0WD00pUZT46rqb0=.2e3906bb-6bf1-4859-bd7d-05fe24389fdb@github.com> On Tue, 19 Jul 2022 10:10:25 GMT, Andrew Haley wrote: >>> To be on the safe side I'm putting this through our internal testing. Please hold off integrating until I give it the green light. Thanks. >> >> OK, thanks. There's no hurry, and no need to get this one into the next release. > >> To be on the safe side I'm putting this through our internal testing. Please hold off integrating until I give it the green light. Thanks. > > Would you like to do this again, please? It's been largely rewritten after review feedback. @theRealAph - our internal testing all passed: tiers 1-4 ------------- PR: https://git.openjdk.org/jdk/pull/9398 From eosterlund at openjdk.org Wed Jul 20 10:20:58 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 10:20:58 GMT Subject: RFR: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 [v2] In-Reply-To: References: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> Message-ID: <8IWLiB44MvXqpZnbBXchZiBUvli1DVBgau-cPPg405Q=.b297069d-2271-4c66-aae3-aae04f97b898@github.com> On Tue, 19 Jul 2022 14:53:12 GMT, Erik ?sterlund wrote: >> The MacroAssembler::verified_entry method is a C2 function. It belongs in C2_MacroAssembler. This patch moves it there on x86. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32 bit build GHA seems happy. Thanks for the review, everyone. Time to /integrate ------------- PR: https://git.openjdk.org/jdk/pull/9558 From shade at redhat.com Wed Jul 20 10:30:49 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 20 Jul 2022 12:30:49 +0200 Subject: RFC: JDK-8290706: Remove the support for inline contiguous allocations Message-ID: Hi there, This is a cleanup that I was itching to do for a while: https://bugs.openjdk.org/browse/JDK-8290706 I have a draft change for it, and it removes about 1 KLOC, mostly from platform-specific code. Before I submit that PR for review, please let me know if you have strong opinions about (not) doing this. -- Thanks, -Aleksey From eosterlund at openjdk.org Wed Jul 20 10:31:03 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 10:31:03 GMT Subject: Integrated: 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 In-Reply-To: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> References: <0MCEuhsE62ozwMXoAaGAOocLhaFGPg6uzAKGhWcMXbI=.8849e39b-22b0-4843-878c-590ee9927c9a@github.com> Message-ID: On Tue, 19 Jul 2022 14:23:40 GMT, Erik ?sterlund wrote: > The MacroAssembler::verified_entry method is a C2 function. It belongs in C2_MacroAssembler. This patch moves it there on x86. This pull request has now been integrated. Changeset: 43c47b1a Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/43c47b1ad7453b4be5ad949d49866de1d911973e Stats: 196 lines in 6 files changed: 102 ins; 92 del; 2 mod 8290534: Move MacroAssembler::verified_entry to C2_MacroAssembler on x86 Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/9558 From erik.osterlund at oracle.com Wed Jul 20 10:34:35 2022 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 20 Jul 2022 10:34:35 +0000 Subject: RFC: JDK-8290706: Remove the support for inline contiguous allocations In-Reply-To: References: Message-ID: <4C516EBF-91A5-480D-B1F7-EA984D0FCAD7@oracle.com> Hi Aleksey, Yes, please. Go for it! A good summer cleanup. Thanks, /Erik > On 20 Jul 2022, at 12:30, Aleksey Shipilev wrote: > > Hi there, > > This is a cleanup that I was itching to do for a while: > https://bugs.openjdk.org/browse/JDK-8290706 > > I have a draft change for it, and it removes about 1 KLOC, mostly from platform-specific code. Before I submit that PR for review, please let me know if you have strong opinions about (not) doing this. > > -- > Thanks, > -Aleksey > From eosterlund at openjdk.org Wed Jul 20 12:12:35 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 12:12:35 GMT Subject: RFR: 8290688: Optimize x86_64 nmethod entry barriers Message-ID: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> The current x86_64 nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas. 1. The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html). 2. Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for at least C2 code. 3. For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that. Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible. ------------- Commit messages: - Optimize x86 nmethod entry barriers Changes: https://git.openjdk.org/jdk/pull/9569/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9569&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290688 Stats: 160 lines in 15 files changed: 147 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/9569.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9569/head:pull/9569 PR: https://git.openjdk.org/jdk/pull/9569 From eosterlund at openjdk.org Wed Jul 20 12:16:36 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 12:16:36 GMT Subject: RFR: 8290688: Optimize x86_64 nmethod entry barriers [v2] In-Reply-To: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> References: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> Message-ID: > The current x86_64 nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas. > > 1. The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html). > > 2. Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for at least C2 code. > > 3. For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that. > > Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: 32 bit build fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9569/files - new: https://git.openjdk.org/jdk/pull/9569/files/2e8dd854..a71e4ae7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9569&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9569&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9569.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9569/head:pull/9569 PR: https://git.openjdk.org/jdk/pull/9569 From aph at openjdk.org Wed Jul 20 13:06:05 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 20 Jul 2022 13:06:05 GMT Subject: Integrated: 8289743: AArch64: Clean up patching logic In-Reply-To: References: Message-ID: On Wed, 6 Jul 2022 13:28:06 GMT, Andrew Haley wrote: > The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. This pull request has now been integrated. Changeset: 1c055076 Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/1c055076e0e460275954cfc8d5e897d72bb9323e Stats: 546 lines in 6 files changed: 363 ins; 93 del; 90 mod 8289743: AArch64: Clean up patching logic Reviewed-by: adinn, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/9398 From jvernee at openjdk.org Wed Jul 20 13:58:07 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 20 Jul 2022 13:58:07 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v2] In-Reply-To: References: Message-ID: > This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. > > Out of the ~1100 files that make up hotspot on Windows x64 , ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. > > Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. > > I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files, so I wanted to gather feedback on this approach before continuing with that. > > --- > > To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: > > PRAGMA_DIAG_PUSH > PRAGMA_ALLOW_LOSSY_CONVERSIONS > > And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: > > PRAGMA_DIAG_POP > > 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. > > [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Remove PUSH POP from cpp files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9516/files - new: https://git.openjdk.org/jdk/pull/9516/files/0484768d..7b309eb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9516&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9516&range=00-01 Stats: 711 lines in 237 files changed: 5 ins; 704 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9516.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9516/head:pull/9516 PR: https://git.openjdk.org/jdk/pull/9516 From jvernee at openjdk.org Wed Jul 20 14:15:54 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 20 Jul 2022 14:15:54 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v3] In-Reply-To: References: Message-ID: > This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. > > Out of the ~1100 files that make up hotspot on Windows x64 , ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. > > Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. > > I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files, so I wanted to gather feedback on this approach before continuing with that. > > --- > > To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: > > PRAGMA_DIAG_PUSH > PRAGMA_ALLOW_LOSSY_CONVERSIONS > > And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: > > PRAGMA_DIAG_POP > > 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. > > [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - Polish - Remove PUSH POP from test files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9516/files - new: https://git.openjdk.org/jdk/pull/9516/files/7b309eb7..d905b203 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9516&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9516&range=01-02 Stats: 35 lines in 14 files changed: 13 ins; 22 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9516.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9516/head:pull/9516 PR: https://git.openjdk.org/jdk/pull/9516 From jvernee at openjdk.org Wed Jul 20 14:18:49 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 20 Jul 2022 14:18:49 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v4] In-Reply-To: References: Message-ID: > This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. > > Out of the ~1100 files that make up hotspot on Windows x64 , ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. > > Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. > > I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files, so I wanted to gather feedback on this approach before continuing with that. > > --- > > To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: > > PRAGMA_DIAG_PUSH > PRAGMA_ALLOW_LOSSY_CONVERSIONS > > And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: > > PRAGMA_DIAG_POP > > 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. > > [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' into Warn_Narrow - Polish pt2 - Polish - Remove PUSH POP from test files - Remove PUSH POP from cpp files - Rest of the tests - More test - AArch64 - Disable for tests - Fix apostrophe - ... and 5 more: https://git.openjdk.org/jdk/compare/1c055076...fb276afd ------------- Changes: https://git.openjdk.org/jdk/pull/9516/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9516&range=03 Stats: 872 lines in 318 files changed: 869 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9516.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9516/head:pull/9516 PR: https://git.openjdk.org/jdk/pull/9516 From aph at openjdk.org Wed Jul 20 14:21:14 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 20 Jul 2022 14:21:14 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v3] In-Reply-To: References: Message-ID: On Wed, 20 Jul 2022 14:15:54 GMT, Jorn Vernee wrote: >> This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. >> >> Out of the ~1100 files that make up hotspot on Windows x64 , ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. >> >> Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. >> >> I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files, so I wanted to gather feedback on this approach before continuing with that. >> >> --- >> >> To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: >> >> PRAGMA_DIAG_PUSH >> PRAGMA_ALLOW_LOSSY_CONVERSIONS >> >> And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: >> >> PRAGMA_DIAG_POP >> >> 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. >> >> [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - Polish > - Remove PUSH POP from test files This feels to me like rather a blunt instrument. IMO, it would be better to use checked_cast() where lossy conversions happen. That would make code easier to understand, and it'd give us warnings when things go wrong. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From jvernee at openjdk.org Wed Jul 20 14:46:04 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 20 Jul 2022 14:46:04 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v3] In-Reply-To: References: Message-ID: On Wed, 20 Jul 2022 14:18:45 GMT, Andrew Haley wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - Polish >> - Remove PUSH POP from test files > > This feels to me like rather a blunt instrument. IMO, it would be better to use checked_cast() where lossy conversions happen. That would make code easier to understand, and it'd give us warnings when things go wrong. @theRealAph I think ultimately, each warning site should be inspected, and a bespoke solution should be found. Inserting `checked_cast` is only one of a possible set of solutions. Others being changing the type of variables used, or some wider refactoring. This change was mostly applied mechanically by parsing the build log, and then using a script to insert these lines, with a manual review + 1 or 2 fixups. Making sure the inserted `checked_cast`s are correct seems much harder. I think having `PRAGMA_ALLOW_LOSSY_CONVERSIONS` in the file also sends a much clearer message that: warnings in this file have not been looked at/fixed yet, which I don't think `checked_cast` does. I have my doubts that adding `checked_cast` everywhere is always correct. In some cases the truncation might be the desired behaviour (just not made explicit with a cast), and I don't want to run the risk of breaking such code where the tests don't catch it (while I also don't want to get into a lengthy process of fixing up those cases one by one). The approach I've taken preserves the current behavior of the code, but it also allows for fixing these warnings on a per-file basis. This seems to me like an easier and safer way to make progress. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From aph at openjdk.org Wed Jul 20 15:17:02 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 20 Jul 2022 15:17:02 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v3] In-Reply-To: References: Message-ID: <_g8tdYQ5mT6j-D2LnVoZvg1yUbb-2bEBWsvEzS68Pro=.550ef5a4-952e-40dc-ae90-5ac26826c03c@github.com> On Wed, 20 Jul 2022 14:42:44 GMT, Jorn Vernee wrote: > This change was mostly applied mechanically by parsing the build log, and then using a script to insert these lines, with a manual review + 1 or 2 fixups. Making sure the inserted `checked_cast`s are correct seems much harder. It is. > I think having `PRAGMA_ALLOW_LOSSY_CONVERSIONS` in the file also sends a much clearer message that: warnings in this file have not been looked at/fixed yet, which I don't think `checked_cast` does. It kinda would, because such casts would have to be considered. But OK. > The approach I've taken preserves the current behavior of the code, but it also allows for fixing these warnings on a per-file basis (besides enabling the warning for the 800 or so files that don't have warnings right now). This seems to me like an easier and safer way to make progress. OK, so this is hopefully a temporary fix. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From jvernee at openjdk.org Wed Jul 20 15:22:47 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 20 Jul 2022 15:22:47 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v3] In-Reply-To: <_g8tdYQ5mT6j-D2LnVoZvg1yUbb-2bEBWsvEzS68Pro=.550ef5a4-952e-40dc-ae90-5ac26826c03c@github.com> References: <_g8tdYQ5mT6j-D2LnVoZvg1yUbb-2bEBWsvEzS68Pro=.550ef5a4-952e-40dc-ae90-5ac26826c03c@github.com> Message-ID: On Wed, 20 Jul 2022 15:13:05 GMT, Andrew Haley wrote: > > I think having `PRAGMA_ALLOW_LOSSY_CONVERSIONS` in the file also sends a much clearer message that: warnings in this file have not been looked at/fixed yet, which I don't think `checked_cast` does. > > It kinda would, because such casts would have to be considered. But OK. What I mean is that `checked_cast` can also be used intentionally. So, looking at a particular `checked_cast` it might be hard to tell if this use case should still be address/fixed. or if it has already been addressed and the solution was to use `checked_cast`. > > The approach I've taken preserves the current behavior of the code, but it also allows for fixing these warnings on a per-file basis (besides enabling the warning for the 800 or so files that don't have warnings right now). This seems to me like an easier and safer way to make progress. > > OK, so this is hopefully a temporary fix. Yes, this is not meant to be a long term solution. Just a way of allowing for more incremental progress, as well as a stop gap for files that are good today. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From eosterlund at openjdk.org Wed Jul 20 15:55:56 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 15:55:56 GMT Subject: RFR: 8290688: Optimize x86_64 nmethod entry barriers [v3] In-Reply-To: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> References: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> Message-ID: > The current x86_64 nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas. > > 1. The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html). > > 2. Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for at least C2 code. > > 3. For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that. > > Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: fix 32 bit build again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9569/files - new: https://git.openjdk.org/jdk/pull/9569/files/a71e4ae7..eeb5f82d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9569&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9569&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9569.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9569/head:pull/9569 PR: https://git.openjdk.org/jdk/pull/9569 From eosterlund at openjdk.org Wed Jul 20 16:34:51 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 16:34:51 GMT Subject: RFR: 8290688: Optimize x86_64 nmethod entry barriers [v4] In-Reply-To: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> References: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> Message-ID: > The current x86_64 nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas. > > 1. The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html). > > 2. Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for at least C2 code. > > 3. For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that. > > Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: fixing 32 bit build again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9569/files - new: https://git.openjdk.org/jdk/pull/9569/files/eeb5f82d..c98b0753 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9569&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9569&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9569.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9569/head:pull/9569 PR: https://git.openjdk.org/jdk/pull/9569 From eosterlund at openjdk.org Wed Jul 20 17:01:32 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 17:01:32 GMT Subject: RFR: 8290700: Optimize AArch64 nmethod entry barriers Message-ID: The original nmethod entry barrier supported only concurrent patching of data and was used by ZGC to solve concurrent class unloading problems. Now it is starting to see more uses. Notably, loom uses nmethod entry barriers to figure out what nmethods have been seen on-stack, needed to remove nmethods safely. However, the concurrent data patching variation was too slow for loom (showed a few regressions), so I brought over a faster nmethod entry barrier that we use in the generational ZGC repo, which additionally handles concurrent patching of data and instructions, which is needed there. However, I still see some small regressions. So for the uses in loom, the classic GCs don't really patch anything interesting concurrently. This leads to the following possible enhancements to improve the situation: 1. Make a dedicated nmethod entry barrier for GCs that don't patch data nor code concurrently, consisting of basically only a conditional branch. There is no need to penalize STW GCs with seat belts protecting against concurrent races that simply do not exist. 2. Move the "guard" word and call into the VM trampoline, to an out-of-line stub towards the end of the nmethod, ensuring instruction caches are not polluted by non-hot instructions at the nmethod entry. Some machines also better optimize the branch-not-taken path of conditional branches. With these optimizations, the small regressions did go away. ------------- Depends on: https://git.openjdk.org/jdk/pull/9569 Commit messages: - 8290700: Optimize AArch64 nmethod entry barriers Changes: https://git.openjdk.org/jdk/pull/9574/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9574&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290700 Stats: 178 lines in 14 files changed: 120 ins; 22 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/9574.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9574/head:pull/9574 PR: https://git.openjdk.org/jdk/pull/9574 From eosterlund at openjdk.org Wed Jul 20 17:09:03 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 17:09:03 GMT Subject: RFR: 8290700: Optimize AArch64 nmethod entry barriers In-Reply-To: References: Message-ID: On Wed, 20 Jul 2022 16:54:53 GMT, Erik ?sterlund wrote: > The original nmethod entry barrier supported only concurrent patching of data and was used by ZGC to solve concurrent class unloading problems. Now it is starting to see more uses. Notably, loom uses nmethod entry barriers to figure out what nmethods have been seen on-stack, needed to remove nmethods safely. However, the concurrent data patching variation was too slow for loom (showed a few regressions), so I brought over a faster nmethod entry barrier that we use in the generational ZGC repo, which additionally handles concurrent patching of data and instructions, which is needed there. > However, I still see some small regressions. So for the uses in loom, the classic GCs don't really patch anything interesting concurrently. This leads to the following possible enhancements to improve the situation: > > 1. Make a dedicated nmethod entry barrier for GCs that don't patch data nor code concurrently, consisting of basically only a conditional branch. There is no need to penalize STW GCs with seat belts protecting against concurrent races that simply do not exist. > > 2. Move the "guard" word and call into the VM trampoline, to an out-of-line stub towards the end of the nmethod, ensuring instruction caches are not polluted by non-hot instructions at the nmethod entry. Some machines also better optimize the branch-not-taken path of conditional branches. > > With these optimizations, the small regressions did go away. Note that this PR is dependent on https://github.com/openjdk/jdk/pull/9569/ ------------- PR: https://git.openjdk.org/jdk/pull/9574 From kvn at openjdk.org Wed Jul 20 20:08:02 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 20 Jul 2022 20:08:02 GMT Subject: RFR: 8290688: Optimize x86_64 nmethod entry barriers [v4] In-Reply-To: References: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> Message-ID: On Wed, 20 Jul 2022 16:34:51 GMT, Erik ?sterlund wrote: >> The current x86_64 nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas. >> >> 1. The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html). >> >> 2. Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for at least C2 code. >> >> 3. For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that. >> >> Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > fixing 32 bit build again Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9569 From eosterlund at openjdk.org Wed Jul 20 20:24:03 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 20 Jul 2022 20:24:03 GMT Subject: RFR: 8290688: Optimize x86_64 nmethod entry barriers [v4] In-Reply-To: References: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> Message-ID: On Wed, 20 Jul 2022 20:05:40 GMT, Vladimir Kozlov wrote: > Looks good. Thanks for the review, @vnkozlov! ------------- PR: https://git.openjdk.org/jdk/pull/9569 From dholmes at openjdk.org Thu Jul 21 01:55:59 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 21 Jul 2022 01:55:59 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v4] In-Reply-To: References: Message-ID: On Wed, 20 Jul 2022 14:18:49 GMT, Jorn Vernee wrote: >> This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. i.e. it is not meant as a long term solution, but as a way of allowing incremental progress. >> >> Out of the ~1100 files that make up hotspot on Windows x64 , ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. >> >> Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. >> >> I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files, so I wanted to gather feedback on this approach before continuing with that. >> >> --- >> >> To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: >> >> PRAGMA_DIAG_PUSH >> PRAGMA_ALLOW_LOSSY_CONVERSIONS >> >> And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: >> >> PRAGMA_DIAG_POP >> >> 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. >> >> [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into Warn_Narrow > - Polish pt2 > - Polish > - Remove PUSH POP from test files > - Remove PUSH POP from cpp files > - Rest of the tests > - More test > - AArch64 > - Disable for tests > - Fix apostrophe > - ... and 5 more: https://git.openjdk.org/jdk/compare/1c055076...fb276afd So IIUC the idea here is to turn on this warning to catch lossy conversions going forward, but to suppress the warning on all existing cases in the hope that eventually one day they will get looked at and fixed as appropriate. Is this such an insidious problem that we absolutely must prevent any future occurrences from arising, noting that if they happen in a file already ignoring the warning then we won't notice anyway? To me this needs to be a first step in a well-defined and resourced plan to actually address these issues, not just the first step with a hope other steps will follow. Just my 2c. Cheers. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From shade at openjdk.org Thu Jul 21 05:40:33 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 21 Jul 2022 05:40:33 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations Message-ID: See the bug for rationale and link to RFC. This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. Additional testing: - [x] Linux x86_64 fastdebug `tier1` - [x] Linux x86_32 fastdebug `tier1` - [x] Linux x86_64 Zero build - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) - [x] Linux RISC-V cross-build (attn @RealFYang) Apart from x86, I only verified the cross-compilation builds pass, no other testing is done. I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. ------------- Commit messages: - Work Changes: https://git.openjdk.org/jdk/pull/9576/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9576&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290706 Stats: 1007 lines in 45 files changed: 3 ins; 961 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/9576.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9576/head:pull/9576 PR: https://git.openjdk.org/jdk/pull/9576 From duke at openjdk.org Thu Jul 21 08:12:03 2022 From: duke at openjdk.org (duke) Date: Thu, 21 Jul 2022 08:12:03 GMT Subject: Withdrawn: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events In-Reply-To: References: Message-ID: On Wed, 25 May 2022 12:24:03 GMT, Joakim Nordstr?m wrote: > Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. > > # Testing > - jdk_jfr This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/8883 From tschatzl at openjdk.org Thu Jul 21 08:17:33 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 21 Jul 2022 08:17:33 GMT Subject: RFR: 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing() Message-ID: Hi all, please review this change that fixes some callers of `G1CollectedHeap::heap_region_containing` to a) fail on trying to get the `Heapregion*` on uncommitted regions and b) change some callers to use the correct `heap_region_containing_or_null` method that had been intended there. Also remove some now unneccessary asserts as `heap_region_containing` will fail when it previously returned `nullptr`. Testing: tier1-5 Thanks, Thomas ------------- Depends on: https://git.openjdk.org/jdk/pull/9572 Commit messages: - Some more removal of unnecessary checks - Initial changes - Fix the places where the wrong heap_region_containing() method has been used Changes: https://git.openjdk.org/jdk/pull/9584/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9584&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290715 Stats: 22 lines in 5 files changed: 0 ins; 8 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/9584.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9584/head:pull/9584 PR: https://git.openjdk.org/jdk/pull/9584 From rrich at openjdk.org Thu Jul 21 08:29:03 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 21 Jul 2022 08:29:03 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations In-Reply-To: References: Message-ID: On Wed, 20 Jul 2022 17:39:28 GMT, Aleksey Shipilev wrote: > See the bug for rationale and link to RFC. > > This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) > - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) > - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) > - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) > - [x] Linux RISC-V cross-build (attn @RealFYang) > > Apart from x86, I only verified the cross-compilation builds pass, no other testing is done. > > I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. jtreg:test/hotspot/jtreg:tier1 succeeded (without gtests) on ppc64le. More tests overnight. ------------- PR: https://git.openjdk.org/jdk/pull/9576 From dlong at openjdk.org Thu Jul 21 09:17:00 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Jul 2022 09:17:00 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Mon, 18 Jul 2022 13:36:46 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic We are seeing some new crashes after this change. The cause seems to be bad values using ADRP: see JDK-8290780. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Thu Jul 21 09:22:10 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 21 Jul 2022 09:22:10 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Thu, 21 Jul 2022 09:13:44 GMT, Dean Long wrote: > We are seeing some new crashes after this change. The cause seems to be bad values using ADRP: see JDK-8290780. Oh! I thought that one had been tested fully. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From ngasson at openjdk.org Thu Jul 21 09:34:10 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Thu, 21 Jul 2022 09:34:10 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Thu, 21 Jul 2022 09:18:14 GMT, Andrew Haley wrote: > We are seeing some new crashes after this change. The cause seems to be bad values using ADRP: see JDK-8290780. Is this test in the public repository? I can't find RunThese30M.java anywhere. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From duke at openjdk.org Thu Jul 21 09:49:57 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Thu, 21 Jul 2022 09:49:57 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines [v4] In-Reply-To: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: > A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. > > Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Use ResizeableResourceHashtable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9405/files - new: https://git.openjdk.org/jdk/pull/9405/files/df99b229..9e5acc96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=02-03 Stats: 69 lines in 4 files changed: 15 ins; 37 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/9405.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9405/head:pull/9405 PR: https://git.openjdk.org/jdk/pull/9405 From dlong at openjdk.org Thu Jul 21 09:59:12 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Jul 2022 09:59:12 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Thu, 21 Jul 2022 09:30:34 GMT, Nick Gasson wrote: > Is this test in the public repository? I can't find RunThese30M.java anywhere. Unfortunately no, it's a closed Oracle test. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Thu Jul 21 11:14:18 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 21 Jul 2022 11:14:18 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Thu, 21 Jul 2022 09:30:34 GMT, Nick Gasson wrote: > We are seeing some new crashes after this change. The cause seems to be bad values using ADRP: see JDK-8290780. I'm on it now. I think it might be a sign-extension bug when calculating the page offset. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From coleenp at openjdk.org Thu Jul 21 12:59:52 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 21 Jul 2022 12:59:52 GMT Subject: RFR: 8290718: Remove ALLOCATION_SUPER_CLASS_SPEC Message-ID: This super class simply exists to have virtual functions for print() in non-product mode. In CHeapObj objects, it creates a vptr that break deterministic archive creation when I change the type of ModuleEntry in upcoming work. I moved the virtual print functions to ResourceObj and CompilationResourceObj since there are still several print functions called for those types. Many were for the compilation types. Maybe someone should fix those someday. Tested with tier1 on Oracle supported platforms and builds-only for others (ppc, s390, etc). ------------- Commit messages: - 8290718: Remove ALLOCATION_SUPER_CLASS_SPEC Changes: https://git.openjdk.org/jdk/pull/9591/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9591&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290718 Stats: 51 lines in 6 files changed: 18 ins; 23 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/9591.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9591/head:pull/9591 PR: https://git.openjdk.org/jdk/pull/9591 From bulasevich at openjdk.org Thu Jul 21 13:24:03 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 21 Jul 2022 13:24:03 GMT Subject: RFR: 8288477: nmethod header size reduction In-Reply-To: References: <7mxKH7I2VPLTgBZ1fu2yVkEZZoGFSLx7UDbnDX3FNi8=.5252afc4-fb24-4225-a7fc-dc648e89076b@github.com> Message-ID: <-3Dx8uuVxSsJSAtiKHvonNaPFuMTBZlmq7b-hhoGwzw=.ced9ab57-f6ea-4550-a7c0-657e41bbdc3d@github.com> On Fri, 8 Jul 2022 20:51:00 GMT, Vladimir Kozlov wrote: > > > Most of the files changed are because of CompLevel. It feels a little disruptive. I'd rather do the minimal changes. > > > > > > Do you mind using the CompilerType? Since we have this type defined, I think it should be used. Does it make sense to propose this int->CompilerType cleanup as a separate change prior to this one? > > I was going to suggest doing it as a separate change after this one. Ok. I removed the unnecessary changes. I will bring them as a separate change. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From bulasevich at openjdk.org Thu Jul 21 13:32:08 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 21 Jul 2022 13:32:08 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 13:15:20 GMT, Boris Ulasevich wrote: > Would be interesting to profile fields access. I assume hot fields should be in first cache line. I created JDK-8290818 for that. I do not know profile tools to grab the access statistics automatically. I think I need to patch all the nmethod getters (check if it is done from asm), and run some apps to collect the data with the patched VM. Please let me know if you know a better way. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From bulasevich at openjdk.org Thu Jul 21 14:01:12 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 21 Jul 2022 14:01:12 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations In-Reply-To: References: Message-ID: <3EkHVEzdrQf1Tfn56On-Puqwzwj6aJ4Oy0ZYL-7RZXU=.d14990ed-c416-476a-9755-db82b10b65d4@github.com> On Wed, 20 Jul 2022 17:39:28 GMT, Aleksey Shipilev wrote: > See the bug for rationale and link to RFC. > > This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) > - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) > - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) > - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) > - [x] Linux RISC-V cross-build (attn @RealFYang) > > Apart from x86, I only verified the cross-compilation builds pass, no other testing is done. > > I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. :hotspot and :tier1 is OK on ARM32 ------------- PR: https://git.openjdk.org/jdk/pull/9576 From duke at openjdk.org Thu Jul 21 14:07:23 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 21 Jul 2022 14:07:23 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 Message-ID: `trampoline_call` can do dummy code generation to calculate the size of C2 generated code. This is done in the output phase. In [src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042](https://github.com/openjdk/jdk/blob/e0d361cea91d3dd1450aece73f660b4abb7ce5fa/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042) Loom code needed to generate a trampoline call outside of C2 and without the output phase. This caused test crashes. The project Loom added `trampoline_call1` to workaround the crashes. This PR improves detection of C2 output phase which makes `trampoline_call1` redundant. Tested the fastdebug/release builds: - `'gtest`: Passed - `tier1`...`tier2`: Passed ------------- Commit messages: - 8287393: AArch64: Remove trampoline_call1 Changes: https://git.openjdk.org/jdk/pull/9592/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9592&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8287393 Stats: 12 lines in 2 files changed: 6 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/9592.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9592/head:pull/9592 PR: https://git.openjdk.org/jdk/pull/9592 From zgu at openjdk.org Thu Jul 21 14:11:48 2022 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 21 Jul 2022 14:11:48 GMT Subject: RFR: 8290718: Remove ALLOCATION_SUPER_CLASS_SPEC In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 12:50:04 GMT, Coleen Phillimore wrote: > This super class simply exists to have virtual functions for print() in non-product mode. In CHeapObj objects, it creates a vptr that break deterministic archive creation when I change the type of ModuleEntry in upcoming work. > I moved the virtual print functions to ResourceObj and CompilationResourceObj since there are still several print functions called for those types. Many were for the compilation types. Maybe someone should fix those someday. > Tested with tier1 on Oracle supported platforms and builds-only for others (ppc, s390, etc). LGTM ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.org/jdk/pull/9591 From shade at openjdk.org Thu Jul 21 16:17:38 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 21 Jul 2022 16:17:38 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: > See the bug for rationale and link to RFC. > > This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) > - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) > - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) > - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) > - [x] Linux RISC-V cross-build (attn @RealFYang) > > Apart from x86, I only verified the cross-compilation builds pass, no other testing is done. > > I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into JDK-8290706-remove-inline-contig - Work ------------- Changes: https://git.openjdk.org/jdk/pull/9576/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9576&range=01 Stats: 1007 lines in 45 files changed: 3 ins; 961 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/9576.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9576/head:pull/9576 PR: https://git.openjdk.org/jdk/pull/9576 From coleenp at openjdk.org Thu Jul 21 16:33:04 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 21 Jul 2022 16:33:04 GMT Subject: RFR: 8290718: Remove ALLOCATION_SUPER_CLASS_SPEC In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 12:50:04 GMT, Coleen Phillimore wrote: > This super class simply exists to have virtual functions for print() in non-product mode. In CHeapObj objects, it creates a vptr that break deterministic archive creation when I change the type of ModuleEntry in upcoming work. > I moved the virtual print functions to ResourceObj and CompilationResourceObj since there are still several print functions called for those types. Many were for the compilation types. Maybe someone should fix those someday. > Tested with tier1 on Oracle supported platforms and builds-only for others (ppc, s390, etc). Thanks Zhengyu. I've been also building shenandoah accidentally so it works for that too. ------------- PR: https://git.openjdk.org/jdk/pull/9591 From duke at openjdk.org Thu Jul 21 16:37:03 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 21 Jul 2022 16:37:03 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: > `trampoline_call` can do dummy code generation to calculate the size of C2 generated code. This is done in the output phase. In [src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042](https://github.com/openjdk/jdk/blob/e0d361cea91d3dd1450aece73f660b4abb7ce5fa/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042) Loom code needed to generate a trampoline call outside of C2 and without the output phase. This caused test crashes. The project Loom added `trampoline_call1` to workaround the crashes. > > This PR improves detection of C2 output phase which makes `trampoline_call1` redundant. > > Tested the fastdebug/release builds: > - `'gtest`: Passed > - `tier1`...`tier2`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Replace trampoline_call1 with trampoline_call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9592/files - new: https://git.openjdk.org/jdk/pull/9592/files/8ba17f7b..e732890b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9592&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9592&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9592.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9592/head:pull/9592 PR: https://git.openjdk.org/jdk/pull/9592 From kvn at openjdk.org Thu Jul 21 17:00:14 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Jul 2022 17:00:14 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: On Wed, 13 Jul 2022 20:39:55 GMT, Boris Ulasevich wrote: >> Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. >> >> Cleanup work: apply CompLevel type where applicable. >> >> The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime >> >> Renaissance benchmarks shows no performance regressions on x86 and aarch. >> >> BEFORE: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> /* 8 | 4 */ const CompilerType _type; <<<< >> /* 12 | 4 */ int _size; >> /* 16 | 4 */ int _header_size; >> /* 20 | 4 */ int _frame_complete_offset; >> /* 24 | 4 */ int _data_offset; >> /* 28 | 4 */ int _frame_size; >> /* 32 | 8 */ address _code_begin; >> /* 40 | 8 */ address _code_end; >> /* 48 | 8 */ address _content_begin; >> /* 56 | 8 */ address _data_end; >> /* 64 | 8 */ address _relocation_begin; >> /* 72 | 8 */ address _relocation_end; >> /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 88 | 1 */ bool _caller_must_gc_arguments; >> /* 89 | 1 */ bool _is_compiled; >> /* XXX 6-byte hole */ >> /* 96 | 8 */ const char *_name; >> /* 104 | 8 */ class AsmRemarks { >> /* 104 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 112 | 8 */ class DbgStrings { >> /* 112 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 120 */ >> } >> >> AFTER: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> protected: >> /* 8 | 8 */ address _code_begin; >> /* 16 | 8 */ address _code_end; >> /* 24 | 8 */ address _content_begin; >> /* 32 | 8 */ address _data_end; >> /* 40 | 8 */ address _relocation_begin; >> /* 48 | 8 */ address _relocation_end; >> /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 64 | 8 */ const char *_name; >> /* 72 | 4 */ int _size; >> /* 76 | 4 */ int _header_size; >> /* 80 | 4 */ int _frame_complete_offset; >> /* 84 | 4 */ int _data_offset; >> /* 88 | 4 */ int _frame_size; >> /* 92 | 1 */ bool _caller_must_gc_arguments; >> /* 93 | 1 */ bool _is_compiled; >> /* 94 | 1 */ const CompilerType _type; <<<< >> /* XXX 1-byte hole */ >> /* 96 | 8 */ class AsmRemarks { >> /* 96 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 104 | 8 */ class DbgStrings { >> /* 104 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 112 */ >> } >> >> BEFORE: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> private: >> /* 208 | 4 */ int _entry_bci; >> /* XXX 4-byte hole */ >> /* 216 | 8 */ uint64_t _gc_epoch; >> /* 224 | 8 */ nmethod *_osr_link; >> /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 240 | 8 */ address _entry_point; >> /* 248 | 8 */ address _verified_entry_point; >> /* 256 | 8 */ address _osr_entry_point; >> /* 264 | 4 */ int _exception_offset; >> /* 268 | 4 */ int _unwind_handler_offset; >> /* 272 | 4 */ int _consts_offset; >> /* 276 | 4 */ int _stub_offset; >> /* 280 | 4 */ int _oops_offset; >> /* 284 | 4 */ int _metadata_offset; >> /* 288 | 4 */ int _scopes_data_offset; >> /* 292 | 4 */ int _scopes_pcs_offset; >> /* 296 | 4 */ int _dependencies_offset; >> /* 300 | 4 */ int _handler_table_offset; >> /* 304 | 4 */ int _nul_chk_table_offset; >> /* 308 | 4 */ int _speculations_offset; >> /* 312 | 4 */ int _jvmci_data_offset; >> /* 316 | 4 */ int _nmethod_end_offset; >> /* 320 | 4 */ int _orig_pc_offset; >> /* 324 | 4 */ int _compile_id; >> /* 328 | 4 */ int _comp_level; <<<< >> /* 332 | 1 */ bool _has_flushed_dependencies; >> /* 333 | 1 */ bool _unload_reported; >> /* 334 | 1 */ bool _load_reported; >> /* 335 | 1 */ volatile signed char _state; >> /* 336 | 1 */ bool _oops_are_stale; >> /* XXX 3-byte hole */ >> /* 340 | 4 */ RTMState _rtm_state; >> /* 344 | 4 */ volatile jint _lock_count; >> /* XXX 4-byte hole */ >> /* 352 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 360 | 4 */ int _hotness_counter; >> /* 364 | 1 */ volatile uint8_t _is_unloading_state; >> /* XXX 3-byte hole */ >> /* 368 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; >> >> /* total size (bytes): 376 */ >> } >> >> AFTER: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> /* 200 | 8 */ uint64_t _gc_epoch; >> /* 208 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 216 | 8 */ nmethod *_osr_link; >> /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 232 | 8 */ address _entry_point; >> /* 240 | 8 */ address _verified_entry_point; >> /* 248 | 8 */ address _osr_entry_point; >> /* 256 | 4 */ int _entry_bci; >> /* 260 | 4 */ int _exception_offset; >> /* 264 | 4 */ int _unwind_handler_offset; >> /* 268 | 4 */ int _consts_offset; >> /* 272 | 4 */ int _stub_offset; >> /* 276 | 4 */ int _oops_offset; >> /* 280 | 4 */ int _metadata_offset; >> /* 284 | 4 */ int _scopes_data_offset; >> /* 288 | 4 */ int _scopes_pcs_offset; >> /* 292 | 4 */ int _dependencies_offset; >> /* 296 | 4 */ int _handler_table_offset; >> /* 300 | 4 */ int _nul_chk_table_offset; >> /* 304 | 4 */ int _speculations_offset; >> /* 308 | 4 */ int _jvmci_data_offset; >> /* 312 | 4 */ int _nmethod_end_offset; >> /* 316 | 4 */ int _orig_pc_offset; >> /* 320 | 4 */ int _compile_id; >> /* 324 | 4 */ RTMState _rtm_state; >> /* 328 | 4 */ volatile jint _lock_count; >> /* 332 | 4 */ int _hotness_counter; >> /* 336 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; >> /* 344 | 1 */ CompLevel _comp_level; <<<< >> /* 345 | 1 */ volatile uint8_t _is_unloading_state; >> /* 346 | 1 */ bool _has_flushed_dependencies; >> /* 347 | 1 */ bool _unload_reported; >> /* 348 | 1 */ bool _load_reported; >> /* 349 | 1 */ volatile signed char _state; >> /* 350 | 1 */ bool _oops_are_stale; >> >> /* total size (bytes): 352 */ >> } > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Undo applying CompLevel where applicable. It must be a separate change This version looks good to me. I will test it. @dougxc these changes affect JVMCI. Please, review it. They need to be uploaded to Graal's JVMCI after pushed. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From kvn at openjdk.org Thu Jul 21 17:00:16 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Jul 2022 17:00:16 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: <61pOE0ToTQdpDF86nQd1Rx4ZAV1g1SRaIUhlvOK1u1k=.98bad706-3a9f-44c0-94d6-ddcfca0d35cd@github.com> On Thu, 21 Jul 2022 13:28:24 GMT, Boris Ulasevich wrote: > > Would be interesting to profile fields access. I assume hot fields should be in first cache line. > > I created JDK-8290818 for that. I do not know profile tools to grab the access statistics automatically. I think I need to patch all the nmethod getters (check if it is done from asm), and run some apps to collect the data with the patched VM. Please let me know if you know a better way. @ericcaspole can you suggest how we can do that without patching VM code? ------------- PR: https://git.openjdk.org/jdk/pull/9165 From never at openjdk.org Thu Jul 21 18:29:11 2022 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 21 Jul 2022 18:29:11 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: On Wed, 13 Jul 2022 20:39:55 GMT, Boris Ulasevich wrote: >> Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. >> >> Cleanup work: apply CompLevel type where applicable. >> >> The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime >> >> Renaissance benchmarks shows no performance regressions on x86 and aarch. >> >> BEFORE: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> /* 8 | 4 */ const CompilerType _type; <<<< >> /* 12 | 4 */ int _size; >> /* 16 | 4 */ int _header_size; >> /* 20 | 4 */ int _frame_complete_offset; >> /* 24 | 4 */ int _data_offset; >> /* 28 | 4 */ int _frame_size; >> /* 32 | 8 */ address _code_begin; >> /* 40 | 8 */ address _code_end; >> /* 48 | 8 */ address _content_begin; >> /* 56 | 8 */ address _data_end; >> /* 64 | 8 */ address _relocation_begin; >> /* 72 | 8 */ address _relocation_end; >> /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 88 | 1 */ bool _caller_must_gc_arguments; >> /* 89 | 1 */ bool _is_compiled; >> /* XXX 6-byte hole */ >> /* 96 | 8 */ const char *_name; >> /* 104 | 8 */ class AsmRemarks { >> /* 104 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 112 | 8 */ class DbgStrings { >> /* 112 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 120 */ >> } >> >> AFTER: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> protected: >> /* 8 | 8 */ address _code_begin; >> /* 16 | 8 */ address _code_end; >> /* 24 | 8 */ address _content_begin; >> /* 32 | 8 */ address _data_end; >> /* 40 | 8 */ address _relocation_begin; >> /* 48 | 8 */ address _relocation_end; >> /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 64 | 8 */ const char *_name; >> /* 72 | 4 */ int _size; >> /* 76 | 4 */ int _header_size; >> /* 80 | 4 */ int _frame_complete_offset; >> /* 84 | 4 */ int _data_offset; >> /* 88 | 4 */ int _frame_size; >> /* 92 | 1 */ bool _caller_must_gc_arguments; >> /* 93 | 1 */ bool _is_compiled; >> /* 94 | 1 */ const CompilerType _type; <<<< >> /* XXX 1-byte hole */ >> /* 96 | 8 */ class AsmRemarks { >> /* 96 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 104 | 8 */ class DbgStrings { >> /* 104 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 112 */ >> } >> >> BEFORE: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> private: >> /* 208 | 4 */ int _entry_bci; >> /* XXX 4-byte hole */ >> /* 216 | 8 */ uint64_t _gc_epoch; >> /* 224 | 8 */ nmethod *_osr_link; >> /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 240 | 8 */ address _entry_point; >> /* 248 | 8 */ address _verified_entry_point; >> /* 256 | 8 */ address _osr_entry_point; >> /* 264 | 4 */ int _exception_offset; >> /* 268 | 4 */ int _unwind_handler_offset; >> /* 272 | 4 */ int _consts_offset; >> /* 276 | 4 */ int _stub_offset; >> /* 280 | 4 */ int _oops_offset; >> /* 284 | 4 */ int _metadata_offset; >> /* 288 | 4 */ int _scopes_data_offset; >> /* 292 | 4 */ int _scopes_pcs_offset; >> /* 296 | 4 */ int _dependencies_offset; >> /* 300 | 4 */ int _handler_table_offset; >> /* 304 | 4 */ int _nul_chk_table_offset; >> /* 308 | 4 */ int _speculations_offset; >> /* 312 | 4 */ int _jvmci_data_offset; >> /* 316 | 4 */ int _nmethod_end_offset; >> /* 320 | 4 */ int _orig_pc_offset; >> /* 324 | 4 */ int _compile_id; >> /* 328 | 4 */ int _comp_level; <<<< >> /* 332 | 1 */ bool _has_flushed_dependencies; >> /* 333 | 1 */ bool _unload_reported; >> /* 334 | 1 */ bool _load_reported; >> /* 335 | 1 */ volatile signed char _state; >> /* 336 | 1 */ bool _oops_are_stale; >> /* XXX 3-byte hole */ >> /* 340 | 4 */ RTMState _rtm_state; >> /* 344 | 4 */ volatile jint _lock_count; >> /* XXX 4-byte hole */ >> /* 352 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 360 | 4 */ int _hotness_counter; >> /* 364 | 1 */ volatile uint8_t _is_unloading_state; >> /* XXX 3-byte hole */ >> /* 368 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; >> >> /* total size (bytes): 376 */ >> } >> >> AFTER: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> /* 200 | 8 */ uint64_t _gc_epoch; >> /* 208 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 216 | 8 */ nmethod *_osr_link; >> /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 232 | 8 */ address _entry_point; >> /* 240 | 8 */ address _verified_entry_point; >> /* 248 | 8 */ address _osr_entry_point; >> /* 256 | 4 */ int _entry_bci; >> /* 260 | 4 */ int _exception_offset; >> /* 264 | 4 */ int _unwind_handler_offset; >> /* 268 | 4 */ int _consts_offset; >> /* 272 | 4 */ int _stub_offset; >> /* 276 | 4 */ int _oops_offset; >> /* 280 | 4 */ int _metadata_offset; >> /* 284 | 4 */ int _scopes_data_offset; >> /* 288 | 4 */ int _scopes_pcs_offset; >> /* 292 | 4 */ int _dependencies_offset; >> /* 296 | 4 */ int _handler_table_offset; >> /* 300 | 4 */ int _nul_chk_table_offset; >> /* 304 | 4 */ int _speculations_offset; >> /* 308 | 4 */ int _jvmci_data_offset; >> /* 312 | 4 */ int _nmethod_end_offset; >> /* 316 | 4 */ int _orig_pc_offset; >> /* 320 | 4 */ int _compile_id; >> /* 324 | 4 */ RTMState _rtm_state; >> /* 328 | 4 */ volatile jint _lock_count; >> /* 332 | 4 */ int _hotness_counter; >> /* 336 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; >> /* 344 | 1 */ CompLevel _comp_level; <<<< >> /* 345 | 1 */ volatile uint8_t _is_unloading_state; >> /* 346 | 1 */ bool _has_flushed_dependencies; >> /* 347 | 1 */ bool _unload_reported; >> /* 348 | 1 */ bool _load_reported; >> /* 349 | 1 */ volatile signed char _state; >> /* 350 | 1 */ bool _oops_are_stale; >> >> /* total size (bytes): 352 */ >> } > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Undo applying CompLevel where applicable. It must be a separate change Doug is out but I took a look. The changes look fine and since Graal doesn't access nmethod::_comp_level directly we don't need to adjust to these changes. src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 258: > 256: \ > 257: nonstatic_field(nmethod, _verified_entry_point, address) \ > 258: nonstatic_field(nmethod, _comp_level, int) \ You should declare CompLevel in this file as well. I think it might be missing the sanity checking that detect missing type declarations. ------------- Marked as reviewed by never (Reviewer). PR: https://git.openjdk.org/jdk/pull/9165 From jwaters at openjdk.org Thu Jul 21 18:46:41 2022 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 21 Jul 2022 18:46:41 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information Message-ID: Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. ------------- Commit messages: - Update compilationPolicy.hpp - Minor comment cleanup - Better clarify documentation Changes: https://git.openjdk.org/jdk/pull/9598/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290834 Stats: 9 lines in 2 files changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9598.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9598/head:pull/9598 PR: https://git.openjdk.org/jdk/pull/9598 From never at openjdk.org Thu Jul 21 19:01:08 2022 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 21 Jul 2022 19:01:08 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: On Wed, 13 Jul 2022 20:39:55 GMT, Boris Ulasevich wrote: >> Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. >> >> Cleanup work: apply CompLevel type where applicable. >> >> The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime >> >> Renaissance benchmarks shows no performance regressions on x86 and aarch. >> >> BEFORE: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> /* 8 | 4 */ const CompilerType _type; <<<< >> /* 12 | 4 */ int _size; >> /* 16 | 4 */ int _header_size; >> /* 20 | 4 */ int _frame_complete_offset; >> /* 24 | 4 */ int _data_offset; >> /* 28 | 4 */ int _frame_size; >> /* 32 | 8 */ address _code_begin; >> /* 40 | 8 */ address _code_end; >> /* 48 | 8 */ address _content_begin; >> /* 56 | 8 */ address _data_end; >> /* 64 | 8 */ address _relocation_begin; >> /* 72 | 8 */ address _relocation_end; >> /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 88 | 1 */ bool _caller_must_gc_arguments; >> /* 89 | 1 */ bool _is_compiled; >> /* XXX 6-byte hole */ >> /* 96 | 8 */ const char *_name; >> /* 104 | 8 */ class AsmRemarks { >> /* 104 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 112 | 8 */ class DbgStrings { >> /* 112 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 120 */ >> } >> >> AFTER: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> protected: >> /* 8 | 8 */ address _code_begin; >> /* 16 | 8 */ address _code_end; >> /* 24 | 8 */ address _content_begin; >> /* 32 | 8 */ address _data_end; >> /* 40 | 8 */ address _relocation_begin; >> /* 48 | 8 */ address _relocation_end; >> /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 64 | 8 */ const char *_name; >> /* 72 | 4 */ int _size; >> /* 76 | 4 */ int _header_size; >> /* 80 | 4 */ int _frame_complete_offset; >> /* 84 | 4 */ int _data_offset; >> /* 88 | 4 */ int _frame_size; >> /* 92 | 1 */ bool _caller_must_gc_arguments; >> /* 93 | 1 */ bool _is_compiled; >> /* 94 | 1 */ const CompilerType _type; <<<< >> /* XXX 1-byte hole */ >> /* 96 | 8 */ class AsmRemarks { >> /* 96 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 104 | 8 */ class DbgStrings { >> /* 104 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 112 */ >> } >> >> BEFORE: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> private: >> /* 208 | 4 */ int _entry_bci; >> /* XXX 4-byte hole */ >> /* 216 | 8 */ uint64_t _gc_epoch; >> /* 224 | 8 */ nmethod *_osr_link; >> /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 240 | 8 */ address _entry_point; >> /* 248 | 8 */ address _verified_entry_point; >> /* 256 | 8 */ address _osr_entry_point; >> /* 264 | 4 */ int _exception_offset; >> /* 268 | 4 */ int _unwind_handler_offset; >> /* 272 | 4 */ int _consts_offset; >> /* 276 | 4 */ int _stub_offset; >> /* 280 | 4 */ int _oops_offset; >> /* 284 | 4 */ int _metadata_offset; >> /* 288 | 4 */ int _scopes_data_offset; >> /* 292 | 4 */ int _scopes_pcs_offset; >> /* 296 | 4 */ int _dependencies_offset; >> /* 300 | 4 */ int _handler_table_offset; >> /* 304 | 4 */ int _nul_chk_table_offset; >> /* 308 | 4 */ int _speculations_offset; >> /* 312 | 4 */ int _jvmci_data_offset; >> /* 316 | 4 */ int _nmethod_end_offset; >> /* 320 | 4 */ int _orig_pc_offset; >> /* 324 | 4 */ int _compile_id; >> /* 328 | 4 */ int _comp_level; <<<< >> /* 332 | 1 */ bool _has_flushed_dependencies; >> /* 333 | 1 */ bool _unload_reported; >> /* 334 | 1 */ bool _load_reported; >> /* 335 | 1 */ volatile signed char _state; >> /* 336 | 1 */ bool _oops_are_stale; >> /* XXX 3-byte hole */ >> /* 340 | 4 */ RTMState _rtm_state; >> /* 344 | 4 */ volatile jint _lock_count; >> /* XXX 4-byte hole */ >> /* 352 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 360 | 4 */ int _hotness_counter; >> /* 364 | 1 */ volatile uint8_t _is_unloading_state; >> /* XXX 3-byte hole */ >> /* 368 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; >> >> /* total size (bytes): 376 */ >> } >> >> AFTER: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> /* 200 | 8 */ uint64_t _gc_epoch; >> /* 208 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 216 | 8 */ nmethod *_osr_link; >> /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 232 | 8 */ address _entry_point; >> /* 240 | 8 */ address _verified_entry_point; >> /* 248 | 8 */ address _osr_entry_point; >> /* 256 | 4 */ int _entry_bci; >> /* 260 | 4 */ int _exception_offset; >> /* 264 | 4 */ int _unwind_handler_offset; >> /* 268 | 4 */ int _consts_offset; >> /* 272 | 4 */ int _stub_offset; >> /* 276 | 4 */ int _oops_offset; >> /* 280 | 4 */ int _metadata_offset; >> /* 284 | 4 */ int _scopes_data_offset; >> /* 288 | 4 */ int _scopes_pcs_offset; >> /* 292 | 4 */ int _dependencies_offset; >> /* 296 | 4 */ int _handler_table_offset; >> /* 300 | 4 */ int _nul_chk_table_offset; >> /* 304 | 4 */ int _speculations_offset; >> /* 308 | 4 */ int _jvmci_data_offset; >> /* 312 | 4 */ int _nmethod_end_offset; >> /* 316 | 4 */ int _orig_pc_offset; >> /* 320 | 4 */ int _compile_id; >> /* 324 | 4 */ RTMState _rtm_state; >> /* 328 | 4 */ volatile jint _lock_count; >> /* 332 | 4 */ int _hotness_counter; >> /* 336 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; >> /* 344 | 1 */ CompLevel _comp_level; <<<< >> /* 345 | 1 */ volatile uint8_t _is_unloading_state; >> /* 346 | 1 */ bool _has_flushed_dependencies; >> /* 347 | 1 */ bool _unload_reported; >> /* 348 | 1 */ bool _load_reported; >> /* 349 | 1 */ volatile signed char _state; >> /* 350 | 1 */ bool _oops_are_stale; >> >> /* total size (bytes): 352 */ >> } > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Undo applying CompLevel where applicable. It must be a separate change Why not be more aggressive about using offsets instead of explicit addresses? Why store both _begin and _end fields? The end of one section is usually the beginning of the another section. These explicit fields aren't very nice either. Why not use an enum with arrays for the various values? Then this stuff can all be written in a consistent pattern using offsets and getting the end of a section using the beginning of the next section. Section alignments could also be declared in a consistent way using the enum. You could probably smush _is_compiled, _caller_must_gc_arguments, _type and _header_size into a bitfield. Certainly _header_size doesn't need to be a full int since sizeof(nmethod) is only around 400 bytes. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From dlong at openjdk.org Thu Jul 21 20:20:08 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 21 Jul 2022 20:20:08 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Mon, 18 Jul 2022 13:36:46 GMT, Andrew Haley wrote: >> The current logic for patching is a mess of if-then-elses. By rearranging the logic and using a switch we can make it both easier to understand and faster. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8289743: AArch64: Clean up patching logic > offset = offset << (64-21) >> (64-21); My guess is that ADRP is being used for a target address that is too far away. The added line above fixes up the high bits so that the assert in spatch() won't fire, hiding the problem: > 239 guarantee (chk == -1 || chk == 0, "Field too big for insn"); ------------- PR: https://git.openjdk.org/jdk/pull/9398 From duke at openjdk.org Thu Jul 21 20:28:01 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 21 Jul 2022 20:28:01 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:37:03 GMT, Evgeny Astigeevich wrote: >> `trampoline_call` can do dummy code generation to calculate the size of C2 generated code. This is done in the output phase. In [src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042](https://github.com/openjdk/jdk/blob/e0d361cea91d3dd1450aece73f660b4abb7ce5fa/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042) Loom code needed to generate a trampoline call outside of C2 and without the output phase. This caused test crashes. The project Loom added `trampoline_call1` to workaround the crashes. >> >> This PR improves detection of C2 output phase which makes `trampoline_call1` redundant. >> >> Tested the fastdebug/release builds: >> - `'gtest`: Passed >> - `tier1`...`tier2`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Replace trampoline_call1 with trampoline_call Andrew (@theRealAph), could you please have a look? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9592 From coleenp at openjdk.org Thu Jul 21 20:29:34 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 21 Jul 2022 20:29:34 GMT Subject: RFR: 8290812: Add a test for ResourceHashtable Message-ID: I added a test for ResourceHashtable to show the interactions with it and Symbol* refcounting. Tested with tier1 on Oracle platforms. ------------- Commit messages: - fix wording - 8290812: Add a test for ResourceHashtable Changes: https://git.openjdk.org/jdk/pull/9603/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9603&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290812 Stats: 100 lines in 1 file changed: 99 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9603.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9603/head:pull/9603 PR: https://git.openjdk.org/jdk/pull/9603 From coleenp at openjdk.org Thu Jul 21 20:36:54 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 21 Jul 2022 20:36:54 GMT Subject: RFR: 8290812: Add a test for ResourceHashtable [v2] In-Reply-To: References: Message-ID: > I added a test for ResourceHashtable to show the interactions with it and Symbol* refcounting. > Tested with tier1 on Oracle platforms. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add some more comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9603/files - new: https://git.openjdk.org/jdk/pull/9603/files/f4387ce8..29d6788c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9603&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9603&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9603.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9603/head:pull/9603 PR: https://git.openjdk.org/jdk/pull/9603 From duke at openjdk.org Thu Jul 21 21:01:21 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 21 Jul 2022 21:01:21 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines [v4] In-Reply-To: References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: <3RShBGCAdrdq0SrQuYHV9S3M4tFkhfBG-PjvJHJ3uIM=.83ba723f-53c9-4550-9c2f-1102ef989fb7@github.com> On Thu, 21 Jul 2022 09:49:57 GMT, Yi-Fan Tsai wrote: >> A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. >> >> Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Use ResizeableResourceHashtable Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 31: > 29: void CodeBuffer::share_trampoline_for(address dest, int caller_offset) { > 30: if (_shared_trampoline_requests == nullptr) { > 31: _shared_trampoline_requests = new SharedTrampolineRequests(4, 1024); The meaning of `4` and `1024` is not clear. It requires to go to `ResizeableResourceHashtable` definition. Comments or named constants would be useful. `ResizeableResourceHashtable` uses open hashing. The first parameter is a number of buckets. The second parameter is the maximum number of buckets. `maybe_grow` is to rebalance a hash table to reduce lengths of linked lists. I think you don't need `1024` at all. A reasonable number of buckets should be enough. src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 36: > 34: > 35: bool p_created; > 36: LinkedListImpl* offsets = _shared_trampoline_requests->put_if_absent(dest, &p_created); `typedef LinkedListImpl Offsets` test/hotspot/jtreg/compiler/sharedstubs/SharedTrampolineTest.java line 27: > 25: /** > 26: * @test SharedTrampolineTest > 27: * @summary Checks that stubs to the interpreter can be shared for static or final method. Please correct summary. test/hotspot/jtreg/compiler/sharedstubs/SharedTrampolineTest.java line 74: > 72: > 73: public static void main(String[] args) throws Exception { > 74: List compilers = java.util.Arrays.asList("-XX:-TieredCompilation" /* C2 */); You have `import java.util.ArrayList`, you can use a short name. ------------- PR: https://git.openjdk.org/jdk/pull/9405 From duke at openjdk.org Thu Jul 21 21:13:21 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 21 Jul 2022 21:13:21 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 18:57:38 GMT, Tom Rodriguez wrote: >> Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Undo applying CompLevel where applicable. It must be a separate change > > Why not be more aggressive about using offsets instead of explicit addresses? Why store both _begin and _end fields? The end of one section is usually the beginning of the another section. These explicit fields aren't very nice either. Why not use an enum with arrays for the various values? Then this stuff can all be written in a consistent pattern using offsets and getting the end of a section using the beginning of the next section. Section alignments could also be declared in a consistent way using the enum. > > You could probably smush _is_compiled, _caller_must_gc_arguments, _type and _header_size into a bitfield. Certainly _header_size doesn't need to be a full int since sizeof(nmethod) is only around 400 bytes. @tkrodriguez, > Why not be more aggressive about using offsets instead of explicit addresses? Why store both _begin and _end fields? Using offset will make impossible to separate non-code data from nmethod. We are working on prototypes to separate non-code data to improve code density in CodeCache. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From iklam at openjdk.org Thu Jul 21 21:31:08 2022 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 21 Jul 2022 21:31:08 GMT Subject: Withdrawn: 8265473: Move os::Linux to its own header file In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 06:12:33 GMT, Ioi Lam wrote: > Closed. New PR with a better design is at #9600. > > Another step of moving unnecessary stuff outside of os.hpp > > The `os::Linux` class is used only by the Linux-specific code in HotSpot. Therefore, it should be moved outside of os.hpp, which is used by platform-independent code. > > I don't have a good name for the new header. `os_linux.hpp` would have been a good name, but that's already taken, so I am settling on os_linux.impl.hpp. Suggestions are welcome. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9423 From iklam at openjdk.org Thu Jul 21 21:37:53 2022 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 21 Jul 2022 21:37:53 GMT Subject: RFR: 8290840: Refactor the "os" class Message-ID: Please see [JDK-8290840](https://bugs.openjdk.org/browse/JDK-8290840) for the detailed proposal. The `os` class, declared in os.hpp, forms the major part of the HotSpot porting interface. Its structure has gradually deteriorated over the years as new ports are created and new APIs are added. This RFE tries to address the following: - Clearly specify where a porting API should be declared and defined among the various `os*.cpp` and `os*.hpp` files. - Avoid the inappropriate inclusion of OS-specific APIs (such as the `os::Linux class`) by platform-independent source files. ------------- Commit messages: - fixed whitespaces - 8290840: Refactor the "os" class Changes: https://git.openjdk.org/jdk/pull/9600/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9600&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290840 Stats: 1348 lines in 95 files changed: 657 ins; 609 del; 82 mod Patch: https://git.openjdk.org/jdk/pull/9600.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9600/head:pull/9600 PR: https://git.openjdk.org/jdk/pull/9600 From never at openjdk.org Thu Jul 21 23:40:02 2022 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 21 Jul 2022 23:40:02 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: On Wed, 13 Jul 2022 20:39:55 GMT, Boris Ulasevich wrote: >> Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. >> >> Cleanup work: apply CompLevel type where applicable. >> >> The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime >> >> Renaissance benchmarks shows no performance regressions on x86 and aarch. >> >> BEFORE: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> /* 8 | 4 */ const CompilerType _type; <<<< >> /* 12 | 4 */ int _size; >> /* 16 | 4 */ int _header_size; >> /* 20 | 4 */ int _frame_complete_offset; >> /* 24 | 4 */ int _data_offset; >> /* 28 | 4 */ int _frame_size; >> /* 32 | 8 */ address _code_begin; >> /* 40 | 8 */ address _code_end; >> /* 48 | 8 */ address _content_begin; >> /* 56 | 8 */ address _data_end; >> /* 64 | 8 */ address _relocation_begin; >> /* 72 | 8 */ address _relocation_end; >> /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 88 | 1 */ bool _caller_must_gc_arguments; >> /* 89 | 1 */ bool _is_compiled; >> /* XXX 6-byte hole */ >> /* 96 | 8 */ const char *_name; >> /* 104 | 8 */ class AsmRemarks { >> /* 104 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 112 | 8 */ class DbgStrings { >> /* 112 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 120 */ >> } >> >> AFTER: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> protected: >> /* 8 | 8 */ address _code_begin; >> /* 16 | 8 */ address _code_end; >> /* 24 | 8 */ address _content_begin; >> /* 32 | 8 */ address _data_end; >> /* 40 | 8 */ address _relocation_begin; >> /* 48 | 8 */ address _relocation_end; >> /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 64 | 8 */ const char *_name; >> /* 72 | 4 */ int _size; >> /* 76 | 4 */ int _header_size; >> /* 80 | 4 */ int _frame_complete_offset; >> /* 84 | 4 */ int _data_offset; >> /* 88 | 4 */ int _frame_size; >> /* 92 | 1 */ bool _caller_must_gc_arguments; >> /* 93 | 1 */ bool _is_compiled; >> /* 94 | 1 */ const CompilerType _type; <<<< >> /* XXX 1-byte hole */ >> /* 96 | 8 */ class AsmRemarks { >> /* 96 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 104 | 8 */ class DbgStrings { >> /* 104 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 112 */ >> } >> >> BEFORE: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> private: >> /* 208 | 4 */ int _entry_bci; >> /* XXX 4-byte hole */ >> /* 216 | 8 */ uint64_t _gc_epoch; >> /* 224 | 8 */ nmethod *_osr_link; >> /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 240 | 8 */ address _entry_point; >> /* 248 | 8 */ address _verified_entry_point; >> /* 256 | 8 */ address _osr_entry_point; >> /* 264 | 4 */ int _exception_offset; >> /* 268 | 4 */ int _unwind_handler_offset; >> /* 272 | 4 */ int _consts_offset; >> /* 276 | 4 */ int _stub_offset; >> /* 280 | 4 */ int _oops_offset; >> /* 284 | 4 */ int _metadata_offset; >> /* 288 | 4 */ int _scopes_data_offset; >> /* 292 | 4 */ int _scopes_pcs_offset; >> /* 296 | 4 */ int _dependencies_offset; >> /* 300 | 4 */ int _handler_table_offset; >> /* 304 | 4 */ int _nul_chk_table_offset; >> /* 308 | 4 */ int _speculations_offset; >> /* 312 | 4 */ int _jvmci_data_offset; >> /* 316 | 4 */ int _nmethod_end_offset; >> /* 320 | 4 */ int _orig_pc_offset; >> /* 324 | 4 */ int _compile_id; >> /* 328 | 4 */ int _comp_level; <<<< >> /* 332 | 1 */ bool _has_flushed_dependencies; >> /* 333 | 1 */ bool _unload_reported; >> /* 334 | 1 */ bool _load_reported; >> /* 335 | 1 */ volatile signed char _state; >> /* 336 | 1 */ bool _oops_are_stale; >> /* XXX 3-byte hole */ >> /* 340 | 4 */ RTMState _rtm_state; >> /* 344 | 4 */ volatile jint _lock_count; >> /* XXX 4-byte hole */ >> /* 352 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 360 | 4 */ int _hotness_counter; >> /* 364 | 1 */ volatile uint8_t _is_unloading_state; >> /* XXX 3-byte hole */ >> /* 368 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; >> >> /* total size (bytes): 376 */ >> } >> >> AFTER: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> /* 200 | 8 */ uint64_t _gc_epoch; >> /* 208 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 216 | 8 */ nmethod *_osr_link; >> /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 232 | 8 */ address _entry_point; >> /* 240 | 8 */ address _verified_entry_point; >> /* 248 | 8 */ address _osr_entry_point; >> /* 256 | 4 */ int _entry_bci; >> /* 260 | 4 */ int _exception_offset; >> /* 264 | 4 */ int _unwind_handler_offset; >> /* 268 | 4 */ int _consts_offset; >> /* 272 | 4 */ int _stub_offset; >> /* 276 | 4 */ int _oops_offset; >> /* 280 | 4 */ int _metadata_offset; >> /* 284 | 4 */ int _scopes_data_offset; >> /* 288 | 4 */ int _scopes_pcs_offset; >> /* 292 | 4 */ int _dependencies_offset; >> /* 296 | 4 */ int _handler_table_offset; >> /* 300 | 4 */ int _nul_chk_table_offset; >> /* 304 | 4 */ int _speculations_offset; >> /* 308 | 4 */ int _jvmci_data_offset; >> /* 312 | 4 */ int _nmethod_end_offset; >> /* 316 | 4 */ int _orig_pc_offset; >> /* 320 | 4 */ int _compile_id; >> /* 324 | 4 */ RTMState _rtm_state; >> /* 328 | 4 */ volatile jint _lock_count; >> /* 332 | 4 */ int _hotness_counter; >> /* 336 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; >> /* 344 | 1 */ CompLevel _comp_level; <<<< >> /* 345 | 1 */ volatile uint8_t _is_unloading_state; >> /* 346 | 1 */ bool _has_flushed_dependencies; >> /* 347 | 1 */ bool _unload_reported; >> /* 348 | 1 */ bool _load_reported; >> /* 349 | 1 */ volatile signed char _state; >> /* 350 | 1 */ bool _oops_are_stale; >> >> /* total size (bytes): 352 */ >> } > > Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Undo applying CompLevel where applicable. It must be a separate change That's good to hear. I've always thought that would be a good idea. It seems like you'd still be able to separate it into two chunks and use offsets within those. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From john.r.rose at oracle.com Thu Jul 21 23:48:06 2022 From: john.r.rose at oracle.com (John Rose) Date: Thu, 21 Jul 2022 16:48:06 -0700 Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: <87988806-56D9-4C59-BFF9-82C87902771F@oracle.com> I wish we had a more systematic way to handle packed structs with many variable-sized segments. It?s a problem in metadata as well. If there were something (with enums? offset arrays? iterators? bitmaps for conditional elements? all of the above?) that could handle such structs, that would be the place to put all of our heroics, rather than one place at at time, as with nmethods in this case. On 21 Jul 2022, at 16:40, Tom Rodriguez wrote: > On Wed, 13 Jul 2022 20:39:55 GMT, Boris Ulasevich wrote: > >>> Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. >>> >>> Cleanup work: apply CompLevel type where applicable. >>> >>> The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime >>> >>> Renaissance benchmarks shows no performance regressions on x86 and aarch. >>> >>> BEFORE: >>> >>> (gdb) ptype /o CodeBlob >>> /* offset | size */ type = class CodeBlob { >>> /* 8 | 4 */ const CompilerType _type; <<<< >>> /* 12 | 4 */ int _size; >>> /* 16 | 4 */ int _header_size; >>> /* 20 | 4 */ int _frame_complete_offset; >>> /* 24 | 4 */ int _data_offset; >>> /* 28 | 4 */ int _frame_size; >>> /* 32 | 8 */ address _code_begin; >>> /* 40 | 8 */ address _code_end; >>> /* 48 | 8 */ address _content_begin; >>> /* 56 | 8 */ address _data_end; >>> /* 64 | 8 */ address _relocation_begin; >>> /* 72 | 8 */ address _relocation_end; >>> /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; >>> /* 88 | 1 */ bool _caller_must_gc_arguments; >>> /* 89 | 1 */ bool _is_compiled; >>> /* XXX 6-byte hole */ >>> /* 96 | 8 */ const char *_name; >>> /* 104 | 8 */ class AsmRemarks { >>> /* 104 | 8 */ AsmRemarkCollection *_remarks; >>> } _asm_remarks; >>> /* 112 | 8 */ class DbgStrings { >>> /* 112 | 8 */ DbgStringCollection *_strings; >>> } _dbg_strings; >>> >>> /* total size (bytes): 120 */ >>> } >>> >>> AFTER: >>> >>> (gdb) ptype /o CodeBlob >>> /* offset | size */ type = class CodeBlob { >>> protected: >>> /* 8 | 8 */ address _code_begin; >>> /* 16 | 8 */ address _code_end; >>> /* 24 | 8 */ address _content_begin; >>> /* 32 | 8 */ address _data_end; >>> /* 40 | 8 */ address _relocation_begin; >>> /* 48 | 8 */ address _relocation_end; >>> /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; >>> /* 64 | 8 */ const char *_name; >>> /* 72 | 4 */ int _size; >>> /* 76 | 4 */ int _header_size; >>> /* 80 | 4 */ int _frame_complete_offset; >>> /* 84 | 4 */ int _data_offset; >>> /* 88 | 4 */ int _frame_size; >>> /* 92 | 1 */ bool _caller_must_gc_arguments; >>> /* 93 | 1 */ bool _is_compiled; >>> /* 94 | 1 */ const CompilerType _type; <<<< >>> /* XXX 1-byte hole */ >>> /* 96 | 8 */ class AsmRemarks { >>> /* 96 | 8 */ AsmRemarkCollection *_remarks; >>> } _asm_remarks; >>> /* 104 | 8 */ class DbgStrings { >>> /* 104 | 8 */ DbgStringCollection *_strings; >>> } _dbg_strings; >>> >>> /* total size (bytes): 112 */ >>> } >>> >>> BEFORE: >>> >>> (gdb) ptype /o nmethod >>> /* offset | size */ type = class nmethod : public CompiledMethod { >>> private: >>> /* 208 | 4 */ int _entry_bci; >>> /* XXX 4-byte hole */ >>> /* 216 | 8 */ uint64_t _gc_epoch; >>> /* 224 | 8 */ nmethod *_osr_link; >>> /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >>> /* 240 | 8 */ address _entry_point; >>> /* 248 | 8 */ address _verified_entry_point; >>> /* 256 | 8 */ address _osr_entry_point; >>> /* 264 | 4 */ int _exception_offset; >>> /* 268 | 4 */ int _unwind_handler_offset; >>> /* 272 | 4 */ int _consts_offset; >>> /* 276 | 4 */ int _stub_offset; >>> /* 280 | 4 */ int _oops_offset; >>> /* 284 | 4 */ int _metadata_offset; >>> /* 288 | 4 */ int _scopes_data_offset; >>> /* 292 | 4 */ int _scopes_pcs_offset; >>> /* 296 | 4 */ int _dependencies_offset; >>> /* 300 | 4 */ int _handler_table_offset; >>> /* 304 | 4 */ int _nul_chk_table_offset; >>> /* 308 | 4 */ int _speculations_offset; >>> /* 312 | 4 */ int _jvmci_data_offset; >>> /* 316 | 4 */ int _nmethod_end_offset; >>> /* 320 | 4 */ int _orig_pc_offset; >>> /* 324 | 4 */ int _compile_id; >>> /* 328 | 4 */ int _comp_level; <<<< >>> /* 332 | 1 */ bool _has_flushed_dependencies; >>> /* 333 | 1 */ bool _unload_reported; >>> /* 334 | 1 */ bool _load_reported; >>> /* 335 | 1 */ volatile signed char _state; >>> /* 336 | 1 */ bool _oops_are_stale; >>> /* XXX 3-byte hole */ >>> /* 340 | 4 */ RTMState _rtm_state; >>> /* 344 | 4 */ volatile jint _lock_count; >>> /* XXX 4-byte hole */ >>> /* 352 | 8 */ volatile int64_t _stack_traversal_mark; >>> /* 360 | 4 */ int _hotness_counter; >>> /* 364 | 1 */ volatile uint8_t _is_unloading_state; >>> /* XXX 3-byte hole */ >>> /* 368 | 4 */ ByteSize _native_receiver_sp_offset; >>> /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; >>> >>> /* total size (bytes): 376 */ >>> } >>> >>> AFTER: >>> >>> (gdb) ptype /o nmethod >>> /* offset | size */ type = class nmethod : public CompiledMethod { >>> /* 200 | 8 */ uint64_t _gc_epoch; >>> /* 208 | 8 */ volatile int64_t _stack_traversal_mark; >>> /* 216 | 8 */ nmethod *_osr_link; >>> /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >>> /* 232 | 8 */ address _entry_point; >>> /* 240 | 8 */ address _verified_entry_point; >>> /* 248 | 8 */ address _osr_entry_point; >>> /* 256 | 4 */ int _entry_bci; >>> /* 260 | 4 */ int _exception_offset; >>> /* 264 | 4 */ int _unwind_handler_offset; >>> /* 268 | 4 */ int _consts_offset; >>> /* 272 | 4 */ int _stub_offset; >>> /* 276 | 4 */ int _oops_offset; >>> /* 280 | 4 */ int _metadata_offset; >>> /* 284 | 4 */ int _scopes_data_offset; >>> /* 288 | 4 */ int _scopes_pcs_offset; >>> /* 292 | 4 */ int _dependencies_offset; >>> /* 296 | 4 */ int _handler_table_offset; >>> /* 300 | 4 */ int _nul_chk_table_offset; >>> /* 304 | 4 */ int _speculations_offset; >>> /* 308 | 4 */ int _jvmci_data_offset; >>> /* 312 | 4 */ int _nmethod_end_offset; >>> /* 316 | 4 */ int _orig_pc_offset; >>> /* 320 | 4 */ int _compile_id; >>> /* 324 | 4 */ RTMState _rtm_state; >>> /* 328 | 4 */ volatile jint _lock_count; >>> /* 332 | 4 */ int _hotness_counter; >>> /* 336 | 4 */ ByteSize _native_receiver_sp_offset; >>> /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; >>> /* 344 | 1 */ CompLevel _comp_level; <<<< >>> /* 345 | 1 */ volatile uint8_t _is_unloading_state; >>> /* 346 | 1 */ bool _has_flushed_dependencies; >>> /* 347 | 1 */ bool _unload_reported; >>> /* 348 | 1 */ bool _load_reported; >>> /* 349 | 1 */ volatile signed char _state; >>> /* 350 | 1 */ bool _oops_are_stale; >>> >>> /* total size (bytes): 352 */ >>> } >> >> Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Undo applying CompLevel where applicable. It must be a separate change > > That's good to hear. I've always thought that would be a good idea. It seems like you'd still be able to separate it into two chunks and use offsets within those. > > ------------- > > PR: https://git.openjdk.org/jdk/pull/9165 From coleenp at openjdk.org Fri Jul 22 01:06:07 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 22 Jul 2022 01:06:07 GMT Subject: RFR: 8290812: Add a test for ResourceHashtable [v3] In-Reply-To: References: Message-ID: > I added a test for ResourceHashtable to show the interactions with it and Symbol* refcounting. > Tested with tier1 on Oracle platforms. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Clarify the do_entry functions in two deleter classes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9603/files - new: https://git.openjdk.org/jdk/pull/9603/files/29d6788c..64f3338e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9603&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9603&range=01-02 Stats: 11 lines in 1 file changed: 8 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9603.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9603/head:pull/9603 PR: https://git.openjdk.org/jdk/pull/9603 From dlong at openjdk.org Fri Jul 22 01:16:10 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Jul 2022 01:16:10 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information In-Reply-To: References: Message-ID: <1RRGLN3oyV27P-eCojRAE4rVVlFqddZfuVKOTHZfJpI=.828857c9-41b4-4299-8796-3a3eca4c1735@github.com> On Thu, 21 Jul 2022 18:36:43 GMT, Julian Waters wrote: > Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. src/hotspot/share/compiler/compilationPolicy.hpp line 48: > 46: * all data from the MDO will be loaded into the ciMethodData when it is first created. > 47: * (See ciMethod::method_data() in ciMethod.cpp for more details) > 48: * The ciMethodData is just a temporary snapshot. Updates to the profiling data is still done through the MethodData. src/hotspot/share/oops/methodData.hpp line 41: > 39: > 40: // The MethodData object collects counts and other profile information > 41: // during zeroth-tier (interpreter) execution. This should probably say levels 0 and 3. ------------- PR: https://git.openjdk.org/jdk/pull/9598 From stuefe at openjdk.org Fri Jul 22 05:55:06 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 22 Jul 2022 05:55:06 GMT Subject: RFR: 8290812: Add a test for ResourceHashtable [v3] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 01:06:07 GMT, Coleen Phillimore wrote: >> I added a test for ResourceHashtable to show the interactions with it and Symbol* refcounting. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Clarify the do_entry functions in two deleter classes. Hi Coleen, this looks okay. You basically test that both table::remove() and table::unlink() call ~Value(), right? If you are worried about that, you may want to add a test that ~Table() also cleans up correctly. So, when the table is gone, all Values should have been destructed and refcounts should be reverted to their originals. test/hotspot/gtest/utilities/test_resourceHash.cpp line 354: > 352: ASSERT_EQ(s->refcount(), s_orig_count + 3) << "refcount incremented"; > 353: } > 354: ASSERT_EQ(s->refcount(), s_orig_count + 2) << "refcount not copied"; I may misunderstand things, but why do you test the same sequence twice? The first one above seems not to match your description since the value holds the Symbol secure during its lifetime. If you wanted to test what the comment describes you would add a value that does nothing to the Symbol*. But I'm not sure that is even worth testing since it is clear what happens (if you put Symbol* as key and whatever as value, nobody will increase the refcount of Symbol*). ------------- PR: https://git.openjdk.org/jdk/pull/9603 From fyang at openjdk.org Fri Jul 22 06:52:57 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 22 Jul 2022 06:52:57 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work Performed tier1 test on riscv64-linux. Result looks good. ------------- PR: https://git.openjdk.org/jdk/pull/9576 From jwaters at openjdk.org Fri Jul 22 07:33:59 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Jul 2022 07:33:59 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v2] In-Reply-To: References: Message-ID: > Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Correct comment with respect to review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9598/files - new: https://git.openjdk.org/jdk/pull/9598/files/57d60a68..7e485ac4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9598.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9598/head:pull/9598 PR: https://git.openjdk.org/jdk/pull/9598 From jwaters at openjdk.org Fri Jul 22 07:34:02 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Jul 2022 07:34:02 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v2] In-Reply-To: <1RRGLN3oyV27P-eCojRAE4rVVlFqddZfuVKOTHZfJpI=.828857c9-41b4-4299-8796-3a3eca4c1735@github.com> References: <1RRGLN3oyV27P-eCojRAE4rVVlFqddZfuVKOTHZfJpI=.828857c9-41b4-4299-8796-3a3eca4c1735@github.com> Message-ID: On Fri, 22 Jul 2022 01:13:50 GMT, Dean Long wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Correct comment with respect to review > > src/hotspot/share/oops/methodData.hpp line 41: > >> 39: >> 40: // The MethodData object collects counts and other profile information >> 41: // during zeroth-tier (interpreter) execution. > > This should probably say levels 0 and 3. Updated, thanks for the correction ------------- PR: https://git.openjdk.org/jdk/pull/9598 From jwaters at openjdk.org Fri Jul 22 07:43:11 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Jul 2022 07:43:11 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v2] In-Reply-To: <1RRGLN3oyV27P-eCojRAE4rVVlFqddZfuVKOTHZfJpI=.828857c9-41b4-4299-8796-3a3eca4c1735@github.com> References: <1RRGLN3oyV27P-eCojRAE4rVVlFqddZfuVKOTHZfJpI=.828857c9-41b4-4299-8796-3a3eca4c1735@github.com> Message-ID: On Fri, 22 Jul 2022 01:11:04 GMT, Dean Long wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Correct comment with respect to review > > src/hotspot/share/compiler/compilationPolicy.hpp line 48: > >> 46: * all data from the MDO will be loaded into the ciMethodData when it is first created. >> 47: * (See ciMethod::method_data() in ciMethod.cpp for more details) >> 48: * > > The ciMethodData is just a temporary snapshot. Updates to the profiling data is still done through the MethodData. The only place I could find a MethodData being created is in the method // Build a MethodData* object to hold information about this method // collected in the interpreter. void Method::build_interpreter_method_data(const methodHandle& method, TRAPS) which presumably constructs a MethodData for profiling in the Interpreter. Is there a different area where it's created (when profiling with C1) that I missed? ------------- PR: https://git.openjdk.org/jdk/pull/9598 From aph at openjdk.org Fri Jul 22 08:12:04 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Jul 2022 08:12:04 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Thu, 21 Jul 2022 20:16:11 GMT, Dean Long wrote: > > offset = offset << (64-21) >> (64-21); > > My guess is that ADRP is being used for a target address that is too far away. The added line above fixes up the high bits so that the assert in spatch() won't fire, hiding the problem: > > > 239 guarantee (chk == -1 || chk == 0, "Field too big for insn"); I've reproduced the problem and I'm now looking at the cleanest way to fix it. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From aph at openjdk.org Fri Jul 22 08:19:04 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Jul 2022 08:19:04 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:37:03 GMT, Evgeny Astigeevich wrote: >> `trampoline_call` can do dummy code generation to calculate the size of C2 generated code. This is done in the output phase. In [src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042](https://github.com/openjdk/jdk/blob/e0d361cea91d3dd1450aece73f660b4abb7ce5fa/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042) Loom code needed to generate a trampoline call outside of C2 and without the output phase. This caused test crashes. The project Loom added `trampoline_call1` to workaround the crashes. >> >> This PR improves detection of C2 output phase which makes `trampoline_call1` redundant. >> >> Tested the fastdebug/release builds: >> - `'gtest`: Passed >> - `tier1`...`tier2`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Replace trampoline_call1 with trampoline_call src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 637: > 635: // code. > 636: PhaseOutput* phase_output = Compile::current()->output(); > 637: in_scratch_emit_size = Looks reasonable enough. The only change is to check for `Compile::current()->output()` being null, right? ------------- PR: https://git.openjdk.org/jdk/pull/9592 From aph at openjdk.org Fri Jul 22 08:20:05 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Jul 2022 08:20:05 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines [v4] In-Reply-To: References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: <2yzoo-9v59f9D8mZuYQp9qL-ufvo3-0T1aKuGA65YQ4=.e33d044f-5f60-464e-b6b4-b06496a59f11@github.com> On Thu, 21 Jul 2022 09:49:57 GMT, Yi-Fan Tsai wrote: >> A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. >> >> Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Use ResizeableResourceHashtable I'm still looking at this, but it seems rather eccentric to create an array of all the calls, then create a hash table from the array. ------------- PR: https://git.openjdk.org/jdk/pull/9405 From dlong at openjdk.org Fri Jul 22 08:56:10 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Jul 2022 08:56:10 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 07:33:59 GMT, Julian Waters wrote: >> Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Correct comment with respect to review C1 uses ciMethod::ensure_method_data(), which calls Method::build_interpreter_method_data(), to create an MDO if one wasn't already created by the interpreter. So the name build_interpreter_method_data() is a bit misleading, because C1 will use the same MDO as the interpreter. I also found a comment in c1_globals.hpp about C1UpdateMethodData that mentions tier1. I think the comment should be changed to say tier3. ------------- PR: https://git.openjdk.org/jdk/pull/9598 From dlong at openjdk.org Fri Jul 22 08:58:47 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Jul 2022 08:58:47 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Fri, 22 Jul 2022 08:08:40 GMT, Andrew Haley wrote: > I've reproduced the problem and I'm now looking at the cleanest way to fix it. You can check out the patch I uploaded to JDK-8290780 and see if I'm on the right track on not :-) ------------- PR: https://git.openjdk.org/jdk/pull/9398 From jwaters at openjdk.org Fri Jul 22 11:37:04 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Jul 2022 11:37:04 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v3] In-Reply-To: References: Message-ID: <2kb66TqyzzMBV6rDkuLvzC4rXegtF6mm_xYT4TfgcD8=.ec45036a-2227-416a-bba6-25cb9f959893@github.com> > Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'profile' of https://github.com/TheShermanTanker/jdk into profile - Correct comment with respect to review - Update compilationPolicy.hpp - Minor comment cleanup - Merge remote-tracking branch 'upstream/master' into profile - Better clarify documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9598/files - new: https://git.openjdk.org/jdk/pull/9598/files/7e485ac4..fba6dd27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=01-02 Stats: 318 lines in 45 files changed: 213 ins; 20 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/9598.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9598/head:pull/9598 PR: https://git.openjdk.org/jdk/pull/9598 From snazarki at openjdk.org Fri Jul 22 11:43:06 2022 From: snazarki at openjdk.org (Sergey Nazarkin) Date: Fri, 22 Jul 2022 11:43:06 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work Got assert on arm-server-fastdebug: # Internal Error (/home/user/jdk/src/hotspot/cpu/arm/arm.ad:240), pid=24155, tid=24168 # assert(constant_table.size() == consts_size) failed: must be: 208 == 200 # # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.user.jdk) # Java VM: OpenJDK Server VM (fastdebug 20-internal-adhoc.user.jdk, mixed mode, g1 gc, linux-arm) # Problematic frame: # V [libjvm.so+0xe2ed8] MachConstantBaseNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x22b ------------- PR: https://git.openjdk.org/jdk/pull/9576 From jwaters at openjdk.org Fri Jul 22 11:45:05 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Jul 2022 11:45:05 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v4] In-Reply-To: References: Message-ID: > Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Rectify incorrect Tier 1 message for C1UpdateMethodData ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9598/files - new: https://git.openjdk.org/jdk/pull/9598/files/fba6dd27..f6dc3526 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9598.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9598/head:pull/9598 PR: https://git.openjdk.org/jdk/pull/9598 From jwaters at openjdk.org Fri Jul 22 11:47:09 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Jul 2022 11:47:09 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v5] In-Reply-To: References: Message-ID: > Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Quick fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9598/files - new: https://git.openjdk.org/jdk/pull/9598/files/f6dc3526..bf838048 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9598.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9598/head:pull/9598 PR: https://git.openjdk.org/jdk/pull/9598 From coleenp at openjdk.org Fri Jul 22 11:59:00 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 22 Jul 2022 11:59:00 GMT Subject: RFR: 8290812: Add a test for ResourceHashtable [v3] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 01:06:07 GMT, Coleen Phillimore wrote: >> I added a test for ResourceHashtable to show the interactions with it and Symbol* refcounting. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Clarify the do_entry functions in two deleter classes. Hi Thomas, yes, I was trying to show that when the Value is copied into the ResourceHashtable node, it's destructor is called. When it's a pointer, it's destructor is not called and you have to call delete(or needed cleanup) in the do_entry function or outside the remove call. ------------- PR: https://git.openjdk.org/jdk/pull/9603 From coleenp at openjdk.org Fri Jul 22 11:59:03 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 22 Jul 2022 11:59:03 GMT Subject: RFR: 8290812: Add a test for ResourceHashtable [v3] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 05:42:35 GMT, Thomas Stuefe wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Clarify the do_entry functions in two deleter classes. > > test/hotspot/gtest/utilities/test_resourceHash.cpp line 354: > >> 352: ASSERT_EQ(s->refcount(), s_orig_count + 3) << "refcount incremented"; >> 353: } >> 354: ASSERT_EQ(s->refcount(), s_orig_count + 2) << "refcount not copied"; > > I may misunderstand things, but why do you test the same sequence twice? The first one above seems not to match your description since the value holds the Symbol secure during its lifetime. > > If you wanted to test what the comment describes you would add a value that does nothing to the Symbol*. But I'm not sure that is even worth testing since it is clear what happens (if you put Symbol* as key and whatever as value, nobody will increase the refcount of Symbol*). Yes, the first line (344) holds the value through the lifetime but I assert that at the end the refcount is where we started. The second sequence tests the unlink function, which acts differently than remove. In remove, we have to do any refcounting cleanup for the key except destruction because remove doesn't have a callback. In unlink, we have to do the cleanup in the callback. I'm not sure what your suggested test would do, except underflow the refcount of the symbol, which I don't want to do because then it'd be a negative test. ------------- PR: https://git.openjdk.org/jdk/pull/9603 From jwaters at openjdk.org Fri Jul 22 13:42:09 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Jul 2022 13:42:09 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v6] In-Reply-To: References: Message-ID: > Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: New changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9598/files - new: https://git.openjdk.org/jdk/pull/9598/files/bf838048..c48d67f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=04-05 Stats: 15 lines in 2 files changed: 4 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/9598.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9598/head:pull/9598 PR: https://git.openjdk.org/jdk/pull/9598 From jwaters at openjdk.org Fri Jul 22 13:42:11 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 22 Jul 2022 13:42:11 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v6] In-Reply-To: References: <1RRGLN3oyV27P-eCojRAE4rVVlFqddZfuVKOTHZfJpI=.828857c9-41b4-4299-8796-3a3eca4c1735@github.com> Message-ID: On Fri, 22 Jul 2022 07:41:09 GMT, Julian Waters wrote: >> src/hotspot/share/compiler/compilationPolicy.hpp line 48: >> >>> 46: * all data from the MDO will be loaded into the ciMethodData when it is first created. >>> 47: * (See ciMethod::method_data() in ciMethod.cpp for more details) >>> 48: * >> >> The ciMethodData is just a temporary snapshot. Updates to the profiling data is still done through the MethodData. > > The only place I could find a MethodData being created is in the method > > // Build a MethodData* object to hold information about this method > // collected in the interpreter. > void Method::build_interpreter_method_data(const methodHandle& method, TRAPS) > > which presumably constructs a MethodData for profiling in the Interpreter. Is there a different area where it's created (when profiling with C1) that I missed? hopefully resolved by the newest changes ------------- PR: https://git.openjdk.org/jdk/pull/9598 From aph at openjdk.org Fri Jul 22 13:58:22 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Jul 2022 13:58:22 GMT Subject: RFR: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java Message-ID: Fix that masks the offsets used when adrp() is passed an unreachable destination. This reloc allows e.g. `adrp; movk; ldr` to access anywhere in the address space. # SIGSEGV (0xb) at pc=0x0000ffff55964edc, pid=2843096, tid=2850366 # # JRE version: Java(TM) SE Runtime Environment (20.0+7) (fastdebug build 20-ea+7-377) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+7-377, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # J 91101 c2 java.io.ObjectOutputStream.enableReplaceObject(Z)Z java.base at 20-ea (47 bytes) @ 0x0000ffff55964edc [0x0000ffff55964e80+0x000000000000005c] ------------- Commit messages: - 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java - 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java - 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java Changes: https://git.openjdk.org/jdk/pull/9615/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9615&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290780 Stats: 28 lines in 2 files changed: 4 ins; 21 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9615.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9615/head:pull/9615 PR: https://git.openjdk.org/jdk/pull/9615 From rrich at openjdk.org Fri Jul 22 14:06:00 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 22 Jul 2022 14:06:00 GMT Subject: RFR: 8290688: Optimize x86_64 nmethod entry barriers [v4] In-Reply-To: References: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> Message-ID: <3lcL_G7ECuYgh51Z5tJ6G_6H448Pnc6emZ7CLBtZXQw=.29049a37-f282-4e2d-91c3-b5641ea33bd5@github.com> On Wed, 20 Jul 2022 16:34:51 GMT, Erik ?sterlund wrote: >> The current x86_64 nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas. >> >> 1. The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html). >> >> 2. Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for at least C2 code. >> >> 3. For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that. >> >> Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > fixing 32 bit build again Hi Erik, Looks good to me! I can imagine that hot taken branches, especially many of them, can become a performance issue when the hardware fails to predict the target address. Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/9569 From duke at openjdk.org Fri Jul 22 14:29:00 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Fri, 22 Jul 2022 14:29:00 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 08:15:50 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace trampoline_call1 with trampoline_call > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 637: > >> 635: // code. >> 636: PhaseOutput* phase_output = Compile::current()->output(); >> 637: in_scratch_emit_size = > > Looks reasonable enough. The only change is to check for `Compile::current()->output()` being null, right? Do you mean `Compile::current()`? `Compile::current()->output()` is checked in the expression for `in_scratch_emit_size` as `phase_output != NULL`. ------------- PR: https://git.openjdk.org/jdk/pull/9592 From aph at openjdk.org Fri Jul 22 14:34:10 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Jul 2022 14:34:10 GMT Subject: RFR: 8289743: AArch64: Clean up patching logic [v13] In-Reply-To: References: <4Qb7vMjQc74DnuHwV7CkxJqtGFeATZQVyQvragbt3CU=.1ee6210a-700e-4855-9b9a-67850c8ec8ba@github.com> Message-ID: On Fri, 22 Jul 2022 08:55:27 GMT, Dean Long wrote: > > I've reproduced the problem and I'm now looking at the cleanest way to fix it. > > You can check out the patch I uploaded to JDK-8290780 and see if I'm on the right track on not :-) Yes, it's much the same. I fixed the verification logic too. ------------- PR: https://git.openjdk.org/jdk/pull/9398 From richard.reingruber at sap.com Fri Jul 22 14:41:54 2022 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 22 Jul 2022 14:41:54 +0000 Subject: State of the ppc64le port of JEP 425: Virtual Threads (Preview) In-Reply-To: References: Message-ID: Hi, UseContinuationFastPath is now working and enabled by default. The current verion should be close to final one modulo cleanups. Testing: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:hotspot_loom 63 63 0 0 >> jtreg:test/jdk:jdk_loom 213 209 4 0 << ============================== TEST FAILURE Failures are because of wrong monitor count ([1]?) except one where the method Fuzz.com_pin(int,int) is not compilable. The rather important test jdk/jdk/internal/vm/Continuation/Basic.java succeeds. Next will be cleanup, rebasing, testing. Also I want to split off some shared code changes to reduce the size of the port. There is already a really small one out for review https://bugs.openjdk.org/browse/JDK-8289925 (ping?) The majority of shared code changes are required because on ppc the metadata for call linkage (return address, stack pointer) is stored at the top of the caller frame. So it is external to the callee frame as the stack arguments are. The size of that metadata has to be accounted for, very much like the size of the stack arguments. On x86 the metadata is stored in the callee frame. It is not part of the caller-callee overlap. I hope to start the review of the port in September after summer holidays. The current version can again be found here: https://github.com/reinrich/loom/commits/ppc_port Richard. [1] https://bugs.openjdk.org/browse/JDK-8286957 __________________________________________________________________ From: Reingruber, Richard Sent: Monday, July 11, 2022 14:07 To: hotspot-dev at openjdk.java.net ; porters-dev at openjdk.java.net Subject: Re: State of the ppc64le port of JEP 425: Virtual Threads (Preview) ? Hi, ? the port passes now the basic continuation tests jdk/jdk/internal/vm/Continuation/Basic.java with UseContinuationFastPath disabled. ? Actually all tests in hotspot_loom and jdk_loom succeed except for 2 of them where the held monitor count is wrong (potentially caused by [1]) and another one with a method that is not compileable. ? The current version can again be found here: https://github.com/reinrich/loom/commits/ppc_port ? Richard. ? [1] https://bugs.openjdk.org/browse/JDK-8286957 ? __________________________________________________________________ From: Reingruber, Richard Date: Thursday, 2. June 2022 at 13:38 To: jdk-dev , porters-dev at openjdk.java.net , loom-dev at openjdk.java.net Subject: State of the ppc64le port of JEP 425: Virtual Threads (Preview) Hi, I learned today that preview features _must_ be implemented by a port in an OpenJDK release [1]. Unfortunately I have to inform you that I don't think the ppc64le port I'm currently working on will be ready in the JDK19 time frame. When I started the work (Jan. or Dec. I think) I expected to finish it before summer. Even after the last status update [2] I thought I could make it. But with the difficulties I still experience and being 6-8 weeks out of office in summer it is now rather unlikely. And until this morning myself (and actually also my colleagues) assumed this would only be a minor issue. Current Status of the Port: * UseContinuationFastPath is disabled * Basic tests where sequences of interpreted and compiled frames with quite some ? variations are frozen and thawed succeed. ? * GC with stack chunks on the java heap succeed. * Basic exception handling tests succeed. * Basic tests exercising compiled java calls with stack arguments succeed but ? need to be revisited because there are issues. [3] is a selection of test cases that I use in development. [4] is the most recent version of the ppc64le port Main Technical Problems * Shared code makes use of the 'unextended sp' of java frames. This breaks the ? platform abstraction as it makes assumptions on where to find, e.g., stack ? arguments relative to the unextended sp. ? * There are non-obvious interdependencies in the code which make it difficult to ? fix an issue. In an attempt to fix a problem I often have regressions because ? I missed adaptations of dependent parts. And then it it is extremely tedious ? to find the cause of the regression running tests and analyzing very long ? trace output. * Currently I see that the handling of stack arguments of compiled java methods ? works in quite some cases (see [3]) but there are cases where it ? doesn't. Trying alternative approaches means going through the tedious and ? time consuming process described above. ? * Lack of documentation. Heavily templatized implementation. These problems (except the last) could not be foreseen. From a high level the port simply needs to copy frames between stack and heap and provide some assembler glue code. As I know now it is actually a high effort to get the deatils tuned right. Thanks, Richard. [1] Ports _must_ implement preview features in thread "What should the relationship between ports and developers of large projects be?" ??? https://mail.openjdk.java.net/pipermail/jdk-dev/2022-May/006635.html [2] State of the ppc64le loom port as of April 14 ??? https://mail.openjdk.java.net/pipermail/loom-dev/2022-April/004197.html [3] BasicExp.java tests driving development of the port ??? https://github.com/reinrich/loom/blob/3286bc8b72401dbccac59c994919fc425a51cb52/test/jdk/jdk/internal/vm/Continuation/BasicExp.java [4] Most recent version of the ppc64le loom port ??? https://github.com/reinrich/loom/commits/ppc_port From eosterlund at openjdk.org Fri Jul 22 14:43:03 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 22 Jul 2022 14:43:03 GMT Subject: RFR: 8290688: Optimize x86_64 nmethod entry barriers [v4] In-Reply-To: <3lcL_G7ECuYgh51Z5tJ6G_6H448Pnc6emZ7CLBtZXQw=.29049a37-f282-4e2d-91c3-b5641ea33bd5@github.com> References: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> <3lcL_G7ECuYgh51Z5tJ6G_6H448Pnc6emZ7CLBtZXQw=.29049a37-f282-4e2d-91c3-b5641ea33bd5@github.com> Message-ID: On Fri, 22 Jul 2022 14:03:58 GMT, Richard Reingruber wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> fixing 32 bit build again > > Hi Erik, > > Looks good to me! > > I can imagine that hot taken branches, especially many of them, can become a > performance issue when the hardware fails to predict the target address. > > Richard. Thanks for the review @reinrich! ------------- PR: https://git.openjdk.org/jdk/pull/9569 From eosterlund at openjdk.org Fri Jul 22 14:45:19 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 22 Jul 2022 14:45:19 GMT Subject: Integrated: 8290688: Optimize x86_64 nmethod entry barriers In-Reply-To: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> References: <9b8WHjWu5f-PI7kN2roMQs_1SIKX6NpW908Y-8ZJZWY=.73c2b48e-284a-4507-aed3-5e0c7b30a1da@github.com> Message-ID: On Wed, 20 Jul 2022 11:51:08 GMT, Erik ?sterlund wrote: > The current x86_64 nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas. > > 1. The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html). > > 2. Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for at least C2 code. > > 3. For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that. > > Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible. This pull request has now been integrated. Changeset: b28f9dab Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/b28f9dab80bf5d4de89942585c1ed7bb121d9cbd Stats: 162 lines in 16 files changed: 147 ins; 1 del; 14 mod 8290688: Optimize x86_64 nmethod entry barriers Reviewed-by: kvn, rrich ------------- PR: https://git.openjdk.org/jdk/pull/9569 From adinn at openjdk.org Fri Jul 22 14:48:57 2022 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 22 Jul 2022 14:48:57 GMT Subject: RFR: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 13:50:28 GMT, Andrew Haley wrote: > Fix that masks the offsets used when adrp() is passed an unreachable destination. This reloc allows e.g. `adrp; movk; ldr` to access anywhere in the address space. > > > # SIGSEGV (0xb) at pc=0x0000ffff55964edc, pid=2843096, tid=2850366 > # > # JRE version: Java(TM) SE Runtime Environment (20.0+7) (fastdebug build 20-ea+7-377) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+7-377, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # J 91101 c2 java.io.ObjectOutputStream.enableReplaceObject(Z)Z java.base at 20-ea (47 bytes) @ 0x0000ffff55964edc [0x0000ffff55964e80+0x000000000000005c] src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 342: > 340: static int adrpMovk_impl(address insn_addr, address &target) { > 341: Instruction_aarch64::patch(insn_addr + sizeof (uint32_t), 20, 5, (uintptr_t)target >> 32); > 342: uintptr_t dest = (dest & 0xffffffffULL) | (uintptr_t(insn_addr) & 0xffff00000000ULL); This does not look right. `dest` is not defined on the rhs of this expression. ------------- PR: https://git.openjdk.org/jdk/pull/9615 From eosterlund at openjdk.org Fri Jul 22 14:53:43 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 22 Jul 2022 14:53:43 GMT Subject: RFR: 8290700: Optimize AArch64 nmethod entry barriers [v2] In-Reply-To: References: Message-ID: > The original nmethod entry barrier supported only concurrent patching of data and was used by ZGC to solve concurrent class unloading problems. Now it is starting to see more uses. Notably, loom uses nmethod entry barriers to figure out what nmethods have been seen on-stack, needed to remove nmethods safely. However, the concurrent data patching variation was too slow for loom (showed a few regressions), so I brought over a faster nmethod entry barrier that we use in the generational ZGC repo, which additionally handles concurrent patching of data and instructions, which is needed there. > However, I still see some small regressions. So for the uses in loom, the classic GCs don't really patch anything interesting concurrently. This leads to the following possible enhancements to improve the situation: > > 1. Make a dedicated nmethod entry barrier for GCs that don't patch data nor code concurrently, consisting of basically only a conditional branch. There is no need to penalize STW GCs with seat belts protecting against concurrent races that simply do not exist. > > 2. Move the "guard" word and call into the VM trampoline, to an out-of-line stub towards the end of the nmethod, ensuring instruction caches are not polluted by non-hot instructions at the nmethod entry. Some machines also better optimize the branch-not-taken path of conditional branches. > > With these optimizations, the small regressions did go away. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9574/files - new: https://git.openjdk.org/jdk/pull/9574/files/58c98ac6..58c98ac6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9574&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9574&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9574.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9574/head:pull/9574 PR: https://git.openjdk.org/jdk/pull/9574 From jvernee at openjdk.org Fri Jul 22 15:01:26 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 22 Jul 2022 15:01:26 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v4] In-Reply-To: References: Message-ID: On Wed, 20 Jul 2022 14:18:49 GMT, Jorn Vernee wrote: >> This patch enables lossy conversion warnings (C4244 [1]) for hotspot on Windows/MSVC. Instead of fixing all warnings that were produced from this, I've instead locally disabled the warning in the files that produced warnings. This allows gradually making progress with cleaning up these warnings on a per-file basis, instead of trying to fix all of them in one shot. i.e. it is not meant as a long term solution, but as a way of allowing incremental progress. >> >> Out of the ~1100 files that make up hotspot on Windows x64 , ~290 have warnings for them disabled (not counting aarch64 files), which means that with this patch ~800 files are protected by enabling this warning globally. >> >> Warnings can be fixed in individual files, or groups of files in followup patches, and warnings for those files can be enabled. >> >> I'm working on a patch that does the same for GCC, but it produces warnings in about 150 more files, so I wanted to gather feedback on this approach before continuing with that. >> >> --- >> >> To disable warnings for a file, in most cases the following prelude is added after the last `#include` at the start of a file: >> >> PRAGMA_DIAG_PUSH >> PRAGMA_ALLOW_LOSSY_CONVERSIONS >> >> And then the following is added at the end of the file for cpp files, or before closing the header guard for hpp files: >> >> PRAGMA_DIAG_POP >> >> 1 notable exception are files produced by adlc, which had their code-gen modified to add these lines instead. There were also 2 files that include headers in the middle of the file (ostream.cpp & sharedRuntime.cpp), for which I've added the PRAGMA's after the include block at the start of the file instead. They only included system headers, for which disabling warnings doesn't matter any ways. >> >> [1]: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-levels-3-and-4-c4244?view=msvc-170 > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into Warn_Narrow > - Polish pt2 > - Polish > - Remove PUSH POP from test files > - Remove PUSH POP from cpp files > - Rest of the tests > - More test > - AArch64 > - Disable for tests > - Fix apostrophe > - ... and 5 more: https://git.openjdk.org/jdk/compare/1c055076...fb276afd I think these warnings are a nice-to-have. Kinda like a certain code style. However, I don't think it's important enough to organize people and go through fixing each and every one of them in the 450 or so files all at once. The warnings being either unimportant enough to turn them off like we do now, or being important enough to turn them on and fix all of them is, I think, a false dichotomy. In reality, the importance of fixing these warnings seems to be more in the middle. And I think this is what leads to a stalemate (the original JBS issue for this was filed in 2015 [1]). So, I'd like to suggest a compromise instead. We could have a policy similar to how styling is fixed currently in HotSpot (mostly in the compiler code). In that case styling is fixed in the surrounding code that is touched by a patch. In the case of these warnings, I propose that when a patch touches a file with lossy conversion warnings disabled, it should also enable the warnings and fix them for that file. Maybe it's useful to rename the macro to something like `PRAGMA_FIXME_LOSSY_CONVERSIONS` for that as well. [1] : https://bugs.openjdk.org/browse/JDK-8135181 ------------- PR: https://git.openjdk.org/jdk/pull/9516 From eosterlund at openjdk.org Fri Jul 22 15:09:13 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 22 Jul 2022 15:09:13 GMT Subject: RFR: 8290700: Optimize AArch64 nmethod entry barriers [v3] In-Reply-To: References: Message-ID: > The original nmethod entry barrier supported only concurrent patching of data and was used by ZGC to solve concurrent class unloading problems. Now it is starting to see more uses. Notably, loom uses nmethod entry barriers to figure out what nmethods have been seen on-stack, needed to remove nmethods safely. However, the concurrent data patching variation was too slow for loom (showed a few regressions), so I brought over a faster nmethod entry barrier that we use in the generational ZGC repo, which additionally handles concurrent patching of data and instructions, which is needed there. > However, I still see some small regressions. So for the uses in loom, the classic GCs don't really patch anything interesting concurrently. This leads to the following possible enhancements to improve the situation: > > 1. Make a dedicated nmethod entry barrier for GCs that don't patch data nor code concurrently, consisting of basically only a conditional branch. There is no need to penalize STW GCs with seat belts protecting against concurrent races that simply do not exist. > > 2. Move the "guard" word and call into the VM trampoline, to an out-of-line stub towards the end of the nmethod, ensuring instruction caches are not polluted by non-hot instructions at the nmethod entry. Some machines also better optimize the branch-not-taken path of conditional branches. > > With these optimizations, the small regressions did go away. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into 8290700_faster_aarch64_entry_barriers - 8290700: Optimize AArch64 nmethod entry barriers - fixing 32 bit build again - fix 32 bit build again - 32 bit build fix - Optimize x86 nmethod entry barriers ------------- Changes: https://git.openjdk.org/jdk/pull/9574/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9574&range=02 Stats: 179 lines in 14 files changed: 121 ins; 22 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/9574.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9574/head:pull/9574 PR: https://git.openjdk.org/jdk/pull/9574 From aph at openjdk.org Fri Jul 22 15:30:33 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Jul 2022 15:30:33 GMT Subject: RFR: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java [v2] In-Reply-To: References: Message-ID: <3dWjq0B4Au6IqDRbu-onhNfOvlbSg8pofH0YHHuJAL8=.06ccad4d-9006-4a30-b76f-64074d767f5a@github.com> > Fix that masks the offsets used when adrp() is passed an unreachable destination. This reloc allows e.g. `adrp; movk; ldr` to access anywhere in the address space. > > > # SIGSEGV (0xb) at pc=0x0000ffff55964edc, pid=2843096, tid=2850366 > # > # JRE version: Java(TM) SE Runtime Environment (20.0+7) (fastdebug build 20-ea+7-377) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+7-377, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # J 91101 c2 java.io.ObjectOutputStream.enableReplaceObject(Z)Z java.base at 20-ea (47 bytes) @ 0x0000ffff55964edc [0x0000ffff55964e80+0x000000000000005c] Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9615/files - new: https://git.openjdk.org/jdk/pull/9615/files/db3866f8..f973e7a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9615&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9615&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9615.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9615/head:pull/9615 PR: https://git.openjdk.org/jdk/pull/9615 From aph at openjdk.org Fri Jul 22 15:30:35 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Jul 2022 15:30:35 GMT Subject: RFR: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java [v2] In-Reply-To: References: Message-ID: <26zeQip-1hhMbaanI9nznyknReqTxgqNahW-deiZNqw=.f19aa57c-c5d7-4b8d-8f2c-b06b671f9d3e@github.com> On Fri, 22 Jul 2022 14:45:13 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 342: > >> 340: static int adrpMovk_impl(address insn_addr, address &target) { >> 341: Instruction_aarch64::patch(insn_addr + sizeof (uint32_t), 20, 5, (uintptr_t)target >> 32); >> 342: uintptr_t dest = (dest & 0xffffffffULL) | (uintptr_t(insn_addr) & 0xffff00000000ULL); > > This does not look right. `dest` is not defined on the rhs of this expression. Err, I have no idea. Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/9615 From dlong at openjdk.org Fri Jul 22 18:21:01 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Jul 2022 18:21:01 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v6] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 13:42:09 GMT, Julian Waters wrote: >> Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > New changes src/hotspot/share/compiler/compilationPolicy.hpp line 47: > 45: * for the interpreter and ciMethod::ensure_method_data, ciMethod.cpp for C1), and interacts > 46: * with C1 and C2 via the compiler interface. It is updated periodically as more profiling > 47: * information is gathered, directly in the case of the interpreter and through ciMethodData This is still a bit misleading. The information flow between MethodData and ciMethodData is one-way. Only MethodData are updated by the interpreter or generated code. ciMethodData is just a read-only snapshot used during compilation. ------------- PR: https://git.openjdk.org/jdk/pull/9598 From dlong at openjdk.org Fri Jul 22 18:38:39 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Jul 2022 18:38:39 GMT Subject: RFR: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java [v2] In-Reply-To: <3dWjq0B4Au6IqDRbu-onhNfOvlbSg8pofH0YHHuJAL8=.06ccad4d-9006-4a30-b76f-64074d767f5a@github.com> References: <3dWjq0B4Au6IqDRbu-onhNfOvlbSg8pofH0YHHuJAL8=.06ccad4d-9006-4a30-b76f-64074d767f5a@github.com> Message-ID: On Fri, 22 Jul 2022 15:30:33 GMT, Andrew Haley wrote: >> Fix that masks the offsets used when adrp() is passed an unreachable destination. This reloc allows e.g. `adrp; movk; ldr` to access anywhere in the address space. >> >> >> # SIGSEGV (0xb) at pc=0x0000ffff55964edc, pid=2843096, tid=2850366 >> # >> # JRE version: Java(TM) SE Runtime Environment (20.0+7) (fastdebug build 20-ea+7-377) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+7-377, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) >> # Problematic frame: >> # J 91101 c2 java.io.ObjectOutputStream.enableReplaceObject(Z)Z java.base at 20-ea (47 bytes) @ 0x0000ffff55964edc [0x0000ffff55964e80+0x000000000000005c] > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java Looks good. To detect bad addresses at emit time, you could add some asserts that check is_valid_AArch64_address(target) in _adrp() and the patch code. Also maybe check after patching that the desired value was rewritten using target_addr_for_insn(). ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.org/jdk/pull/9615 From kvn at openjdk.org Fri Jul 22 18:42:15 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Jul 2022 18:42:15 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v6] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 13:42:09 GMT, Julian Waters wrote: >> Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > New changes Please note. CI (compiler interface) and its API (ciMethod*) is used by JIT compilers C1 and C2 only **during** compilation. Compilers (C1 and C2) can create MDO (through CI) if it is missed but they don't update data in it. Only Interpreter and tier 3 (profiling) **compiled code** produced by C1 updates MDO. It does it directly without CI. ------------- PR: https://git.openjdk.org/jdk/pull/9598 From kvn at openjdk.org Fri Jul 22 18:42:17 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Jul 2022 18:42:17 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 08:51:55 GMT, Dean Long wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Correct comment with respect to review > > C1 uses ciMethod::ensure_method_data(), which calls Method::build_interpreter_method_data(), to create an MDO if one wasn't already created by the interpreter. So the name build_interpreter_method_data() is a bit misleading, because C1 will use the same MDO as the interpreter. > > I also found a comment in c1_globals.hpp about C1UpdateMethodData that mentions tier1. I think the comment should be changed to say tier3. @dean-long posted his comment before I finished my :) But we are saying the same thing. ------------- PR: https://git.openjdk.org/jdk/pull/9598 From duke at openjdk.org Fri Jul 22 18:58:15 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Fri, 22 Jul 2022 18:58:15 GMT Subject: RFR: 8280152: AArch64: Reuse runtime call trampolines [v5] In-Reply-To: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> References: <2Rz88X0uWMdi7N4NFC36ZiMXgOhUmh0XehnaOKo6JWM=.9422ee14-4e73-47a5-a211-842fa5331391@github.com> Message-ID: > A trampoline stub could be generated for each runtime call. These trampolines could be duplication if the callees are the same. This change delays the stub generation and generates one stub for a distinct callee. > > Benchmark als, chi-square, dec-tree, gauss-mix, log-regression, movie-lens, naive-bayes, page-rank, fj-means, reactors, future-genetic, mnemonics, dotty, scala-kmeans, and finagle-http in Renaissance (0.14.1) are tested. The sum of the used size of CodeHeap 'non-profiled nmethods' and CodeHeap 'profiled nmethods' shows ~4.7% reduction on average. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Reduce maximum hash table size and cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9405/files - new: https://git.openjdk.org/jdk/pull/9405/files/9e5acc96..baf72825 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9405&range=03-04 Stats: 13 lines in 3 files changed: 3 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/9405.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9405/head:pull/9405 PR: https://git.openjdk.org/jdk/pull/9405 From kvn at openjdk.org Fri Jul 22 19:13:20 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 22 Jul 2022 19:13:20 GMT Subject: RFR: 8290700: Optimize AArch64 nmethod entry barriers [v3] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 15:09:13 GMT, Erik ?sterlund wrote: >> The original nmethod entry barrier supported only concurrent patching of data and was used by ZGC to solve concurrent class unloading problems. Now it is starting to see more uses. Notably, loom uses nmethod entry barriers to figure out what nmethods have been seen on-stack, needed to remove nmethods safely. However, the concurrent data patching variation was too slow for loom (showed a few regressions), so I brought over a faster nmethod entry barrier that we use in the generational ZGC repo, which additionally handles concurrent patching of data and instructions, which is needed there. >> However, I still see some small regressions. So for the uses in loom, the classic GCs don't really patch anything interesting concurrently. This leads to the following possible enhancements to improve the situation: >> >> 1. Make a dedicated nmethod entry barrier for GCs that don't patch data nor code concurrently, consisting of basically only a conditional branch. There is no need to penalize STW GCs with seat belts protecting against concurrent races that simply do not exist. >> >> 2. Move the "guard" word and call into the VM trampoline, to an out-of-line stub towards the end of the nmethod, ensuring instruction caches are not polluted by non-hot instructions at the nmethod entry. Some machines also better optimize the branch-not-taken path of conditional branches. >> >> With these optimizations, the small regressions did go away. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into 8290700_faster_aarch64_entry_barriers > - 8290700: Optimize AArch64 nmethod entry barriers > - fixing 32 bit build again > - fix 32 bit build again > - 32 bit build fix > - Optimize x86 nmethod entry barriers Looks reasonable to me. Someone familiar with aarch64 code have to review it. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9574 From aph at openjdk.org Fri Jul 22 19:34:05 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Jul 2022 19:34:05 GMT Subject: RFR: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java [v2] In-Reply-To: References: <3dWjq0B4Au6IqDRbu-onhNfOvlbSg8pofH0YHHuJAL8=.06ccad4d-9006-4a30-b76f-64074d767f5a@github.com> Message-ID: <1TnWBvAb0dQhi5nMBU9K1XLTCg9DsD5Cyi6d2QrDGlk=.04ebc10a-8652-47ad-9fc3-82a66c634a26@github.com> On Fri, 22 Jul 2022 18:35:29 GMT, Dean Long wrote: > Looks good. To detect bad addresses at emit time, you could add some asserts that check is_valid_AArch64_address(target) in _adrp() and the patch code. I'll look at that. > Also maybe check after patching that the desired value was rewritten using target_addr_for_insn(). I already do that in `verify()`. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9615 From eosterlund at openjdk.org Fri Jul 22 19:52:01 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 22 Jul 2022 19:52:01 GMT Subject: RFR: 8290700: Optimize AArch64 nmethod entry barriers [v3] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 19:09:34 GMT, Vladimir Kozlov wrote: >> Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'master' into 8290700_faster_aarch64_entry_barriers >> - 8290700: Optimize AArch64 nmethod entry barriers >> - fixing 32 bit build again >> - fix 32 bit build again >> - 32 bit build fix >> - Optimize x86 nmethod entry barriers > > Looks reasonable to me. > Someone familiar with aarch64 code have to review it. Thanks for the review, @vnkozlov! ------------- PR: https://git.openjdk.org/jdk/pull/9574 From dlong at openjdk.org Fri Jul 22 20:09:15 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Jul 2022 20:09:15 GMT Subject: RFR: 8290700: Optimize AArch64 nmethod entry barriers [v3] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 15:09:13 GMT, Erik ?sterlund wrote: >> The original nmethod entry barrier supported only concurrent patching of data and was used by ZGC to solve concurrent class unloading problems. Now it is starting to see more uses. Notably, loom uses nmethod entry barriers to figure out what nmethods have been seen on-stack, needed to remove nmethods safely. However, the concurrent data patching variation was too slow for loom (showed a few regressions), so I brought over a faster nmethod entry barrier that we use in the generational ZGC repo, which additionally handles concurrent patching of data and instructions, which is needed there. >> However, I still see some small regressions. So for the uses in loom, the classic GCs don't really patch anything interesting concurrently. This leads to the following possible enhancements to improve the situation: >> >> 1. Make a dedicated nmethod entry barrier for GCs that don't patch data nor code concurrently, consisting of basically only a conditional branch. There is no need to penalize STW GCs with seat belts protecting against concurrent races that simply do not exist. >> >> 2. Move the "guard" word and call into the VM trampoline, to an out-of-line stub towards the end of the nmethod, ensuring instruction caches are not polluted by non-hot instructions at the nmethod entry. Some machines also better optimize the branch-not-taken path of conditional branches. >> >> With these optimizations, the small regressions did go away. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into 8290700_faster_aarch64_entry_barriers > - 8290700: Optimize AArch64 nmethod entry barriers > - fixing 32 bit build again > - fix 32 bit build again > - 32 bit build fix > - Optimize x86 nmethod entry barriers Marked as reviewed by dlong (Reviewer). src/hotspot/cpu/aarch64/gc/shared/barrierSetNMethod_aarch64.cpp line 85: > 83: } > 84: ShouldNotReachHere(); > 85: } I'm guessing this function is not performance-critical. If it was, then we could consider adding a new field to the nmethod to keep track of the offset for the guard. ------------- PR: https://git.openjdk.org/jdk/pull/9574 From manc at openjdk.org Fri Jul 22 23:46:24 2022 From: manc at openjdk.org (Man Cao) Date: Fri, 22 Jul 2022 23:46:24 GMT Subject: RFR: 8290900: Build failure with Clang 14+ due to function warning attribute Message-ID: Hi all, Could anyone review this change that fixes build failure with Clang 14+? The plan is to disable the warning attribute introduced by JDK-8214976, until https://github.com/llvm/llvm-project/issues/56519 is fixed and released. -Man ------------- Commit messages: - 8290900: Build failure with Clang 14+ due to function warning attribute Changes: https://git.openjdk.org/jdk/pull/9621/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9621&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290900 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9621.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9621/head:pull/9621 PR: https://git.openjdk.org/jdk/pull/9621 From kbarrett at openjdk.org Sat Jul 23 01:40:08 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 23 Jul 2022 01:40:08 GMT Subject: RFR: 8290900: Build failure with Clang 14+ due to function warning attribute In-Reply-To: References: Message-ID: <3rAfSeIdXXGumi1V3OVUdWG0IW_dOE1Mw6sFuYXCKN8=.c444a39a-7ccd-4acd-afa5-1023fa1e13cd@github.com> On Fri, 22 Jul 2022 23:40:08 GMT, Man Cao wrote: > Hi all, > > Could anyone review this change that fixes build failure with Clang 14+? > The plan is to disable the warning attribute introduced by JDK-8214976, until https://github.com/llvm/llvm-project/issues/56519 is fixed and released. > > -Man Looks good. Please create an OpenJDK RFE to reinstate the feature once there is a clang version that is known to work. I'm also not sure why one would even want this new attribute if it isn't additive. (And how do forward declarations deal with non-additive attributes?) If one is the provider of a library and so providing the "first" declaration, and have decided this library function shouldn't be used any more while still providing it for backward compatibility, well, that's what deprecation warnings are for. The use-case from OpenJDK is a client of a library deciding (for whatever reasons) the client code shouldn't (casually/normally) use this library function, so marks it accordingly when compiling client code. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/9621 From manc at openjdk.org Sat Jul 23 02:55:12 2022 From: manc at openjdk.org (Man Cao) Date: Sat, 23 Jul 2022 02:55:12 GMT Subject: RFR: 8290900: Build failure with Clang 14+ due to function warning attribute In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 23:40:08 GMT, Man Cao wrote: > Hi all, > > Could anyone review this change that fixes build failure with Clang 14+? > The plan is to disable the warning attribute introduced by JDK-8214976, until https://github.com/llvm/llvm-project/issues/56519 is fixed and released. > > -Man Thanks for the review. Opened https://bugs.openjdk.org/browse/JDK-8290903 to reinstate the feature. > I'm also not sure why one would even want this new attribute if it isn't additive. https://reviews.llvm.org/D106030 and https://github.com/ClangBuiltLinux/linux/issues/1173 might have some explanation. It seems to support some use cases in Linux kernel, so Clang issues warning/errors at compile time instead of link time. There is also a pre-submit test failure, which looks like being investigated in https://bugs.openjdk.org/browse/JDK-8290885. ------------- PR: https://git.openjdk.org/jdk/pull/9621 From manc at openjdk.org Sat Jul 23 02:57:32 2022 From: manc at openjdk.org (Man Cao) Date: Sat, 23 Jul 2022 02:57:32 GMT Subject: Integrated: 8290900: Build failure with Clang 14+ due to function warning attribute In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 23:40:08 GMT, Man Cao wrote: > Hi all, > > Could anyone review this change that fixes build failure with Clang 14+? > The plan is to disable the warning attribute introduced by JDK-8214976, until https://github.com/llvm/llvm-project/issues/56519 is fixed and released. > > -Man This pull request has now been integrated. Changeset: 0599a05f Author: Man Cao URL: https://git.openjdk.org/jdk/commit/0599a05f8c7e26d4acae0b2cc805a65bdd6c6f67 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod 8290900: Build failure with Clang 14+ due to function warning attribute Reviewed-by: kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/9621 From eosterlund at openjdk.org Sat Jul 23 03:05:08 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Sat, 23 Jul 2022 03:05:08 GMT Subject: RFR: 8290700: Optimize AArch64 nmethod entry barriers [v3] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 20:06:30 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'master' into 8290700_faster_aarch64_entry_barriers >> - 8290700: Optimize AArch64 nmethod entry barriers >> - fixing 32 bit build again >> - fix 32 bit build again >> - 32 bit build fix >> - Optimize x86 nmethod entry barriers > > Marked as reviewed by dlong (Reviewer). Thanks for the review @dean-long! > src/hotspot/cpu/aarch64/gc/shared/barrierSetNMethod_aarch64.cpp line 85: > >> 83: } >> 84: ShouldNotReachHere(); >> 85: } > > I'm guessing this function is not performance-critical. If it was, then we could consider adding a new field to the nmethod to keep track of the offset for the guard. Yeah I thought about adding a new entry to CodeOffsets, but it seemed like this C2-only AArch64-only detail seemed a bit noisy in the shared code then. And it didn't seem warranted, given how cold this path is by design. ------------- PR: https://git.openjdk.org/jdk/pull/9574 From dnsimon at openjdk.org Sat Jul 23 06:47:01 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 23 Jul 2022 06:47:01 GMT Subject: RFR: 8290075: [JVMCI] only blessed methods can link against EventWriterFactory.getEventWriter [v2] In-Reply-To: References: Message-ID: <2A2GsZuoiIVGVQ9ZmInsM2XQWKvvkxwDGbxb2GmsEz8=.928ad93a-85f7-488b-a7fc-15c44604c131@github.com> On Mon, 11 Jul 2022 13:26:34 GMT, Doug Simon wrote: >> [JDK-8282420](https://bugs.openjdk.org/browse/JDK-8282420) introduced the notion of "blessed methods" which are those that can link against `jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long)`. >> This PR enhances the JVMCI ConstantPool API so that it can take a caller context when resolving a method to enforce this constraint properly. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > support special linkage rules for jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long) in JVMCI @egahlin @mgronlun could you please help review this PR. ------------- PR: https://git.openjdk.org/jdk/pull/9449 From jwaters at openjdk.org Sat Jul 23 08:00:02 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 23 Jul 2022 08:00:02 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v6] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 18:36:54 GMT, Vladimir Kozlov wrote: > Please note. CI (compiler interface) and its API (ciMethod*) is used by JIT compilers C1 and C2 only **during** compilation. Compilers (C1 and C2) can create MDO (through CI) if it is missed but they don't update data in it. Only Interpreter and tier 3 (profiling) **compiled code** produced by C1 updates MDO. It does it directly without CI. Ah, my mistake. I'll fix the issue asap ------------- PR: https://git.openjdk.org/jdk/pull/9598 From egahlin at openjdk.org Sat Jul 23 08:01:58 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Sat, 23 Jul 2022 08:01:58 GMT Subject: RFR: 8290075: [JVMCI] only blessed methods can link against EventWriterFactory.getEventWriter [v2] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 13:26:34 GMT, Doug Simon wrote: >> [JDK-8282420](https://bugs.openjdk.org/browse/JDK-8282420) introduced the notion of "blessed methods" which are those that can link against `jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long)`. >> This PR enhances the JVMCI ConstantPool API so that it can take a caller context when resolving a method to enforce this constraint properly. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > support special linkage rules for jdk.jfr.internal.event.EventWriterFactory.getEventWriter(long) in JVMCI This is probably best reviewed by Markus as he did the implementation for C1/C2. He is on vacation, but will be back sometime in august. ------------- PR: https://git.openjdk.org/jdk/pull/9449 From kbarrett at openjdk.org Sat Jul 23 10:25:03 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 23 Jul 2022 10:25:03 GMT Subject: RFR: 8290900: Build failure with Clang 14+ due to function warning attribute In-Reply-To: References: Message-ID: On Sat, 23 Jul 2022 02:52:41 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone review this change that fixes build failure with Clang 14+? >> The plan is to disable the warning attribute introduced by JDK-8214976, until https://github.com/llvm/llvm-project/issues/56519 is fixed and released. >> >> -Man > > Thanks for the review. > > Opened https://bugs.openjdk.org/browse/JDK-8290903 to reinstate the feature. > >> I'm also not sure why one would even want this new attribute if it isn't additive. > > https://reviews.llvm.org/D106030 and https://github.com/ClangBuiltLinux/linux/issues/1173 might have some explanation. It seems to support some use cases in Linux kernel, so Clang issues warning/errors at compile time instead of link time. > > There is also a pre-submit test failure, which looks like being investigated in https://bugs.openjdk.org/browse/JDK-8290885. @caoman - as a HotSpot change, this should have had a second reviewer before being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9621 From duke at openjdk.org Sat Jul 23 13:18:05 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Sat, 23 Jul 2022 13:18:05 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the generation of broadcasting a scalar in several ways: > > - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. > - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. > - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay > > With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: > > Before After > Benchmark Mode Cnt Score Error Score Error Units Gain > SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% > SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% > SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% > SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% > > As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. > > This patch also removes some redundant code paths and renames some incorrectly named instructions. > > Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - rename - consolidate sse checks - benchmark - fix - Merge branch 'master' into improveReplicate - remove duplicate - unsignness - rematerializing input count - fix comparison - fix rematerialize, constant deduplication - ... and 8 more: https://git.openjdk.org/jdk/compare/0599a05f...6c10f9ad ------------- Changes: https://git.openjdk.org/jdk/pull/7832/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=07 Stats: 563 lines in 14 files changed: 360 ins; 85 del; 118 mod Patch: https://git.openjdk.org/jdk/pull/7832.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7832/head:pull/7832 PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Sat Jul 23 13:20:03 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Sat, 23 Jul 2022 13:20:03 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v7] In-Reply-To: References: Message-ID: <-Zs75LtCJld4LgO6NVDKGTA5cuVrWTtAWRe_9-iOGX0=.f751e609-0c4c-47e8-9131-21566aba130c@github.com> On Fri, 18 Mar 2022 00:29:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > remove duplicate I have come back to this PR with some modifications and added a benchmark for this patch. The description is also modified to better present the objective of this patch and show the results. Please have some reviews, thank you very much. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jwaters at openjdk.org Sat Jul 23 13:59:59 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 23 Jul 2022 13:59:59 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v7] In-Reply-To: References: Message-ID: > Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter), while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly misleading, as it seems to imply that MethodData is used in tier 3 as well, when profiling with C1 is done through ciMethodData instead. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Fixup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9598/files - new: https://git.openjdk.org/jdk/pull/9598/files/c48d67f3..a6b75011 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9598&range=05-06 Stats: 12 lines in 2 files changed: 1 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/9598.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9598/head:pull/9598 PR: https://git.openjdk.org/jdk/pull/9598 From jwaters at openjdk.org Sat Jul 23 14:00:03 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 23 Jul 2022 14:00:03 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v6] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 18:17:26 GMT, Dean Long wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> New changes > > src/hotspot/share/compiler/compilationPolicy.hpp line 47: > >> 45: * for the interpreter and ciMethod::ensure_method_data, ciMethod.cpp for C1), and interacts >> 46: * with C1 and C2 via the compiler interface. It is updated periodically as more profiling >> 47: * information is gathered, directly in the case of the interpreter and through ciMethodData > > This is still a bit misleading. The information flow between MethodData and ciMethodData is one-way. Only MethodData are updated by the interpreter or generated code. ciMethodData is just a read-only snapshot used during compilation. Revised ------------- PR: https://git.openjdk.org/jdk/pull/9598 From manc at openjdk.org Sat Jul 23 23:30:44 2022 From: manc at openjdk.org (Man Cao) Date: Sat, 23 Jul 2022 23:30:44 GMT Subject: RFR: 8290900: Build failure with Clang 14+ due to function warning attribute In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 23:40:08 GMT, Man Cao wrote: > Hi all, > > Could anyone review this change that fixes build failure with Clang 14+? > The plan is to disable the warning attribute introduced by JDK-8214976, until https://github.com/llvm/llvm-project/issues/56519 is fixed and released. > > -Man Oh, sorry about that. I saw this under "Progress": > Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer)) I guess it didn't take into account the rules of different subcomponents. ------------- PR: https://git.openjdk.org/jdk/pull/9621 From dholmes at openjdk.org Mon Jul 25 01:19:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 25 Jul 2022 01:19:04 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v4] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 14:56:08 GMT, Jorn Vernee wrote: > I propose that when a patch touches a file with lossy conversion warnings disabled, it should also enable the warnings and fix them for that file. This contradicts other common recomendations of not mixing unrelated issues in the same PR. Fixing style nits is harmless but fixing an actual lossy conversion issue could itself lead to bugs and unexpected fanout. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From dholmes at openjdk.org Mon Jul 25 02:06:48 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 25 Jul 2022 02:06:48 GMT Subject: RFR: 8290840: Refactor the "os" class In-Reply-To: References: Message-ID: <4hI3UWcrI1Wp5F7aFAsbzq32pwqbBQ_IpQHaaR_J-DM=.fdbd805b-3132-4e69-a052-784a69a3150c@github.com> On Thu, 21 Jul 2022 19:45:13 GMT, Ioi Lam wrote: > Please see [JDK-8290840](https://bugs.openjdk.org/browse/JDK-8290840) for the detailed proposal. > > The `os` class, declared in os.hpp, forms the major part of the HotSpot porting interface. Its structure has gradually deteriorated over the years as new ports are created and new APIs are added. > > This RFE tries to address the following: > > - Clearly specify where a porting API should be declared and defined among the various `os*.cpp` and `os*.hpp` files. > - Avoid the inappropriate inclusion of OS-specific APIs (such as the `os::Linux class`) by platform-independent source files. src/hotspot/os/aix/os_aix.cpp line 3009: > 3007: void os::print_active_locale(outputStream* st) { > 3008: os::Posix::print_active_locale(st); > 3009: } You should not need this indirection. The code in os::Posix::x should be the implementation os::x for the os class, but just defined in os_posix.cpp. src/hotspot/share/runtime/os.hpp line 43: > 41: // =================================================================== > 42: // > 43: // The "os" class defines a large part of the interfaces for porting HotSpot s/a large part/a number of/ src/hotspot/share/runtime/os.hpp line 62: > 60: // - src/hotspot/os/posix/os_posix.hpp > 61: // > 62: // These headers declare of APIs that should be used only within the delete "of" src/hotspot/share/runtime/os.hpp line 86: > 84: // > 85: // - src/hotspot/os//os_.inline.hpp > 86: // - src/hotspot/os_cpu/_/os__.inline.hpp They have to be inline? Why? I would expect `os_.cpp` to be able to define implementations of a platform specific os.hpp function. ------------- PR: https://git.openjdk.org/jdk/pull/9600 From stuefe at openjdk.org Mon Jul 25 05:22:05 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 25 Jul 2022 05:22:05 GMT Subject: RFR: 8290812: Add a test for ResourceHashtable [v3] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 01:06:07 GMT, Coleen Phillimore wrote: >> I added a test for ResourceHashtable to show the interactions with it and Symbol* refcounting. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Clarify the do_entry functions in two deleter classes. Hi Coleen, thank you for the explanations. I see what you do now. This looks fine to me. Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/9603 From eosterlund at openjdk.org Mon Jul 25 07:12:49 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 25 Jul 2022 07:12:49 GMT Subject: Integrated: 8290700: Optimize AArch64 nmethod entry barriers In-Reply-To: References: Message-ID: On Wed, 20 Jul 2022 16:54:53 GMT, Erik ?sterlund wrote: > The original nmethod entry barrier supported only concurrent patching of data and was used by ZGC to solve concurrent class unloading problems. Now it is starting to see more uses. Notably, loom uses nmethod entry barriers to figure out what nmethods have been seen on-stack, needed to remove nmethods safely. However, the concurrent data patching variation was too slow for loom (showed a few regressions), so I brought over a faster nmethod entry barrier that we use in the generational ZGC repo, which additionally handles concurrent patching of data and instructions, which is needed there. > However, I still see some small regressions. So for the uses in loom, the classic GCs don't really patch anything interesting concurrently. This leads to the following possible enhancements to improve the situation: > > 1. Make a dedicated nmethod entry barrier for GCs that don't patch data nor code concurrently, consisting of basically only a conditional branch. There is no need to penalize STW GCs with seat belts protecting against concurrent races that simply do not exist. > > 2. Move the "guard" word and call into the VM trampoline, to an out-of-line stub towards the end of the nmethod, ensuring instruction caches are not polluted by non-hot instructions at the nmethod entry. Some machines also better optimize the branch-not-taken path of conditional branches. > > With these optimizations, the small regressions did go away. This pull request has now been integrated. Changeset: 228e8e94 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/228e8e94fe048e56d5513b150060c9b54f15642c Stats: 179 lines in 14 files changed: 121 ins; 22 del; 36 mod 8290700: Optimize AArch64 nmethod entry barriers Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/9574 From shade at openjdk.org Mon Jul 25 08:25:52 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 25 Jul 2022 08:25:52 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work All right, so the current status is that all platforms (except for S390X test) are checked. GHA are also happy. Please do formal reviews :) ------------- PR: https://git.openjdk.org/jdk/pull/9576 From lucy at openjdk.org Mon Jul 25 08:30:03 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 25 Jul 2022 08:30:03 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Jul 2022 08:22:45 GMT, Aleksey Shipilev wrote: > All right, so the current status is that all platforms (except for S390X test) are checked. GHA are also happy. I do not see a particular risk for s390x, mainly because the path being removed was never implemented for s390x. @backwaterred may be able to build and run tier1 tests to add some safety to my assumptions. ------------- PR: https://git.openjdk.org/jdk/pull/9576 From eosterlund at openjdk.org Mon Jul 25 08:53:48 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 25 Jul 2022 08:53:48 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work Looks good in general. However, the StorePConditionalNode is dead after this change, and I think we should remove that one as well while we are at it. AFAIK it is only used for the atomic bump pointer code. There is a whole bunch of StorePConditional matching rules in all the ad files that can be nuked, and a few mentions of Op_StorePConditional in C2 that can be removed. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/9576 From shade at openjdk.org Mon Jul 25 09:08:57 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 25 Jul 2022 09:08:57 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Jul 2022 08:50:17 GMT, Erik ?sterlund wrote: > Looks good in general. However, the StorePConditionalNode is dead after this change, and I think we should remove that one as well while we are at it. AFAIK it is only used for the atomic bump pointer code. There is a whole bunch of StorePConditional matching rules in all the ad files that can be nuked, and a few mentions of Op_StorePConditional in C2 that can be removed. True. I'd rather do that after this change lands, as to not invalidate the platform testing already done here. It would be also beneficial for potential bisects to have two atomic commits -- one removing the code from allocation paths, and another removing the C2 nodes and match rules. WDYT? ------------- PR: https://git.openjdk.org/jdk/pull/9576 From duke at openjdk.org Mon Jul 25 09:39:43 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Mon, 25 Jul 2022 09:39:43 GMT Subject: RFR: 8290074: Remove implicit arguments for RegisterMap constructor Message-ID: 8290074: Remove implicit arguments for RegisterMap constructor ------------- Commit messages: - Fix spurious semicolon - Change to scoped enum - 8290074: Remove implicit arguments for RegisterMap constructor - 8290074: Remove implicit arguments for RegisterMap constructor - 8290074: Remove implicit arguments for RegisterMap constructor Changes: https://git.openjdk.org/jdk/pull/9455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9455&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290074 Stats: 434 lines in 40 files changed: 317 ins; 0 del; 117 mod Patch: https://git.openjdk.org/jdk/pull/9455.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9455/head:pull/9455 PR: https://git.openjdk.org/jdk/pull/9455 From eosterlund at openjdk.org Mon Jul 25 09:39:44 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 25 Jul 2022 09:39:44 GMT Subject: RFR: 8290074: Remove implicit arguments for RegisterMap constructor In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 14:58:07 GMT, Axel Boldt-Christmas wrote: > 8290074: Remove implicit arguments for RegisterMap constructor Looks good except for a spurious semicolon. Don't need another review for that. src/hotspot/share/runtime/frame.cpp line 85: > 83: RegisterMap::RegisterMap(oop continuation, UpdateMap update_map) { > 84: _thread = NULL; > 85: _update_map = update_map == UpdateMap::yes;; One semicolon too many. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/9455 From duke at openjdk.org Mon Jul 25 09:39:50 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Mon, 25 Jul 2022 09:39:50 GMT Subject: RFR: 8290062: Remove nmethodLocker for nmethods on-stack Message-ID: 8290062: Remove nmethodLocker for nmethods on-stack ------------- Commit messages: - 8290062: Remove nmethodLocker for nmethods on-stack Changes: https://git.openjdk.org/jdk/pull/9444/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9444&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290062 Stats: 11 lines in 1 file changed: 0 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9444.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9444/head:pull/9444 PR: https://git.openjdk.org/jdk/pull/9444 From eosterlund at openjdk.org Mon Jul 25 09:39:50 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 25 Jul 2022 09:39:50 GMT Subject: RFR: 8290062: Remove nmethodLocker for nmethods on-stack In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 07:49:09 GMT, Axel Boldt-Christmas wrote: > 8290062: Remove nmethodLocker for nmethods on-stack Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/9444 From duke at openjdk.org Mon Jul 25 09:40:53 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Mon, 25 Jul 2022 09:40:53 GMT Subject: RFR: 8290090: Change CodeBlobType from unscoped enum to enum class Message-ID: 8290090: Change CodeBlobType from unscoped enum to enum class ------------- Commit messages: - 8290090: Change CodeBlobType from unscoped enum to enum class Changes: https://git.openjdk.org/jdk/pull/9460/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9460&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290090 Stats: 78 lines in 11 files changed: 7 ins; 4 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/9460.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9460/head:pull/9460 PR: https://git.openjdk.org/jdk/pull/9460 From eosterlund at openjdk.org Mon Jul 25 09:40:54 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 25 Jul 2022 09:40:54 GMT Subject: RFR: 8290090: Change CodeBlobType from unscoped enum to enum class In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 07:55:49 GMT, Axel Boldt-Christmas wrote: > 8290090: Change CodeBlobType from unscoped enum to enum class Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/9460 From aph at openjdk.org Mon Jul 25 12:41:12 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 25 Jul 2022 12:41:12 GMT Subject: RFR: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java [v2] In-Reply-To: <1TnWBvAb0dQhi5nMBU9K1XLTCg9DsD5Cyi6d2QrDGlk=.04ebc10a-8652-47ad-9fc3-82a66c634a26@github.com> References: <3dWjq0B4Au6IqDRbu-onhNfOvlbSg8pofH0YHHuJAL8=.06ccad4d-9006-4a30-b76f-64074d767f5a@github.com> <1TnWBvAb0dQhi5nMBU9K1XLTCg9DsD5Cyi6d2QrDGlk=.04ebc10a-8652-47ad-9fc3-82a66c634a26@github.com> Message-ID: <3wJ0bG0dRyZ-gbg8PhN2WJqflHwZLcNI0iv95dnLa2k=.3955f57e-06f9-46dc-84b2-a18c5de083b4@github.com> On Fri, 22 Jul 2022 19:31:54 GMT, Andrew Haley wrote: > > Looks good. To detect bad addresses at emit time, you could add some asserts that check is_valid_AArch64_address(target) in _adrp() and the patch code. > > I'll look at that. I'm not going to do that because I think I remember in some cases we have generated a fake "pointer" which is used as a base address. Such a value may not be a legal address. I don't know that there's anywhere we do that now, but it it is a reasonable-enough thing to do. ------------- PR: https://git.openjdk.org/jdk/pull/9615 From aph at openjdk.org Mon Jul 25 12:43:20 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 25 Jul 2022 12:43:20 GMT Subject: Integrated: 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 13:50:28 GMT, Andrew Haley wrote: > Fix that masks the offsets used when adrp() is passed an unreachable destination. This reloc allows e.g. `adrp; movk; ldr` to access anywhere in the address space. > > > # SIGSEGV (0xb) at pc=0x0000ffff55964edc, pid=2843096, tid=2850366 > # > # JRE version: Java(TM) SE Runtime Environment (20.0+7) (fastdebug build 20-ea+7-377) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+7-377, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # J 91101 c2 java.io.ObjectOutputStream.enableReplaceObject(Z)Z java.base at 20-ea (47 bytes) @ 0x0000ffff55964edc [0x0000ffff55964e80+0x000000000000005c] This pull request has now been integrated. Changeset: 1e270ea4 Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/1e270ea4f5e8f9539e85430b9be5cf21a89b4d8f Stats: 28 lines in 2 files changed: 4 ins; 20 del; 4 mod 8290780: AArch64: Crash in c2 nmethod running RunThese30M.java Reviewed-by: dlong ------------- PR: https://git.openjdk.org/jdk/pull/9615 From thartmann at openjdk.org Mon Jul 25 12:57:10 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 25 Jul 2022 12:57:10 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v7] In-Reply-To: References: Message-ID: On Sat, 23 Jul 2022 13:59:59 GMT, Julian Waters wrote: >> Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter) and tier 3, while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly confusing. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fixup Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/9598 From aph at openjdk.org Mon Jul 25 14:03:07 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 25 Jul 2022 14:03:07 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work Good. It's nice to get rid of that code. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.org/jdk/pull/9576 From aph at openjdk.org Mon Jul 25 14:03:10 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 25 Jul 2022 14:03:10 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: <-V7eK43hiIlGl9L-MPr03OF3Pzw0QfzuyHd2MxU2_FY=.5b404def-5528-4911-a231-ce0886804196@github.com> On Mon, 25 Jul 2022 09:05:16 GMT, Aleksey Shipilev wrote: > > Looks good in general. However, the StorePConditionalNode is dead after this change, and I think we should remove that one as well while we are at it. AFAIK it is only used for the atomic bump pointer code. There is a whole bunch of StorePConditional matching rules in all the ad files that can be nuked, and a few mentions of Op_StorePConditional in C2 that can be removed. > > True. I'd rather do that after this change lands, as to not invalidate the platform testing already done here. It would be also beneficial for potential bisects to have two atomic commits -- one removing the code from allocation paths, and another removing the C2 nodes and match rules. WDYT? Yes, I agree. There's more cruft in there. ------------- PR: https://git.openjdk.org/jdk/pull/9576 From tschatzl at openjdk.org Mon Jul 25 14:21:25 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 25 Jul 2022 14:21:25 GMT Subject: RFR: 8290966: G1: Record number of PLAB filled and number of direct allocations Message-ID: Hi all, for evaluation in [JDK-8288966](https://bugs.openjdk.org/browse/JDK-8288966) I added statistics output that show the amount of PLAB fills and direct allocations; I think this is useful for similar evaluations in the future, so I kept and split it out from that change. Adds these values to the existing JFR event too. Testing: PLAB related tests, gha Thanks, Thomas ------------- Commit messages: - Refill/direct allocation stats; fix test regex Changes: https://git.openjdk.org/jdk/pull/9626/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9626&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290966 Stats: 61 lines in 9 files changed: 41 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/9626.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9626/head:pull/9626 PR: https://git.openjdk.org/jdk/pull/9626 From rrich at openjdk.org Mon Jul 25 14:39:07 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 25 Jul 2022 14:39:07 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work No findings in our CI testing which includes JCK and JTREG tests on the standard platforms and also on Linux/PPC64le. I've reviewed the ppc changes. They look good to me. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/9576 From eosterlund at openjdk.org Mon Jul 25 14:52:52 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 25 Jul 2022 14:52:52 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/9576 From eosterlund at openjdk.org Mon Jul 25 14:52:52 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 25 Jul 2022 14:52:52 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: <-V7eK43hiIlGl9L-MPr03OF3Pzw0QfzuyHd2MxU2_FY=.5b404def-5528-4911-a231-ce0886804196@github.com> References: <-V7eK43hiIlGl9L-MPr03OF3Pzw0QfzuyHd2MxU2_FY=.5b404def-5528-4911-a231-ce0886804196@github.com> Message-ID: On Mon, 25 Jul 2022 13:59:20 GMT, Andrew Haley wrote: > > Looks good in general. However, the StorePConditionalNode is dead after this change, and I think we should remove that one as well while we are at it. AFAIK it is only used for the atomic bump pointer code. There is a whole bunch of StorePConditional matching rules in all the ad files that can be nuked, and a few mentions of Op_StorePConditional in C2 that can be removed. > > True. I'd rather do that after this change lands, as to not invalidate the platform testing already done here. It would be also beneficial for potential bisects to have two atomic commits -- one removing the code from allocation paths, and another removing the C2 nodes and match rules. WDYT? Sounds good to me. ------------- PR: https://git.openjdk.org/jdk/pull/9576 From kvn at openjdk.org Mon Jul 25 17:20:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 25 Jul 2022 17:20:40 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v7] In-Reply-To: References: Message-ID: On Sat, 23 Jul 2022 13:59:59 GMT, Julian Waters wrote: >> Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter) and tier 3, while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly confusing. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fixup Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9598 From jwaters at openjdk.org Mon Jul 25 17:32:48 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 25 Jul 2022 17:32:48 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v7] In-Reply-To: References: Message-ID: <6x_al97Iyv5cUa6ExZGsQZ4PJP0uus3vrAtw7OG4Mu4=.ee74aa10-4666-4fe9-ad70-833b58172767@github.com> On Sat, 23 Jul 2022 13:59:59 GMT, Julian Waters wrote: >> Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter) and tier 3, while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly confusing. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fixup Thanks all for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/9598 From kvn at openjdk.org Mon Jul 25 17:36:07 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 25 Jul 2022 17:36:07 GMT Subject: RFR: 8290090: Change CodeBlobType from unscoped enum to enum class In-Reply-To: References: Message-ID: On Tue, 12 Jul 2022 07:55:49 GMT, Axel Boldt-Christmas wrote: > Change: > ```C++ > struct CodeBlobType { > enum { [...] } > } > > To: > ```C++ > enum class CodeBlobType { > [...] > }; > > Using C++11 scoped enums provides a more clear view of intent, as enums can be enforced by the type system instead of being passed around as ints. Looks good. I assume you tested it. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9460 From kvn at openjdk.org Mon Jul 25 17:54:05 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 25 Jul 2022 17:54:05 GMT Subject: RFR: 8290062: Remove nmethodLocker for nmethods on-stack In-Reply-To: References: Message-ID: <7z4n9Pk0-qU8VKfPLHU7hrJ3W4cohWyeKXlQhjvxJrE=.272a7ee4-116a-431a-8a3f-1ea0dbdbb633@github.com> On Mon, 11 Jul 2022 07:49:09 GMT, Axel Boldt-Christmas wrote: > From JBS: > >> The nmethodLocker is pretty nasty as it prevents an nmethod from being freed, but without really keeping it alive. We would like to minimize its use. The most obvious places where it can be removed, is when "protecting" nmethods that are already on-stack. Neither the sweeper nor the GC is interested in making nmethods on-stack not live. These ones simply do not do anything. > > Removed the `nmethodLocker` where the nmethod is a caller on the stack. nmethod could be marked for deoptimization (`non_entrant`) even if it has stack frame. `nmethod::make_not_entrant_or_zombie()` uses `nmethodLocker` for that. There are other changes to nmethod code could be done under this lock. How you determined which `nmethodLocker` could be safely removed? This should be tested with a lot of tiers (at least up to tier 7). Add testing link to RFE. ------------- PR: https://git.openjdk.org/jdk/pull/9444 From iklam at openjdk.org Mon Jul 25 21:54:03 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 25 Jul 2022 21:54:03 GMT Subject: RFR: 8290840: Refactor the "os" class In-Reply-To: <4hI3UWcrI1Wp5F7aFAsbzq32pwqbBQ_IpQHaaR_J-DM=.fdbd805b-3132-4e69-a052-784a69a3150c@github.com> References: <4hI3UWcrI1Wp5F7aFAsbzq32pwqbBQ_IpQHaaR_J-DM=.fdbd805b-3132-4e69-a052-784a69a3150c@github.com> Message-ID: On Mon, 25 Jul 2022 01:59:48 GMT, David Holmes wrote: >> Please see [JDK-8290840](https://bugs.openjdk.org/browse/JDK-8290840) for the detailed proposal. >> >> The `os` class, declared in os.hpp, forms the major part of the HotSpot porting interface. Its structure has gradually deteriorated over the years as new ports are created and new APIs are added. >> >> This RFE tries to address the following: >> >> - Clearly specify where a porting API should be declared and defined among the various `os*.cpp` and `os*.hpp` files. >> - Avoid the inappropriate inclusion of OS-specific APIs (such as the `os::Linux class`) by platform-independent source files. > > src/hotspot/share/runtime/os.hpp line 86: > >> 84: // >> 85: // - src/hotspot/os//os_.inline.hpp >> 86: // - src/hotspot/os_cpu/_/os__.inline.hpp > > They have to be inline? Why? I would expect `os_.cpp` to be able to define implementations of a platform specific os.hpp function. I meant to say: // ... must be implemented in the following *four* files files: // - src/hotspot/os//os_.inline.hpp // - src/hotspot/os_cpu/_/os__.inline.hpp // [explanation for the hpp files] // - src/hotspot/os//os_.cpp // - src/hotspot/os_cpu/_/os__.cpp // [explanation for the cpp files] I'll change the order of the text to make it clear. ------------- PR: https://git.openjdk.org/jdk/pull/9600 From dlong at openjdk.org Mon Jul 25 22:05:26 2022 From: dlong at openjdk.org (Dean Long) Date: Mon, 25 Jul 2022 22:05:26 GMT Subject: RFR: 8290834: Improve potentially confusing documentation on collection of profiling information [v7] In-Reply-To: References: Message-ID: <1MQcV55lgirNA2zPA0N1YTJliGPaJ9JiwCZmhcpQN1s=.2b6d3278-449e-4ab1-b854-e6e2715a373a@github.com> On Sat, 23 Jul 2022 13:59:59 GMT, Julian Waters wrote: >> Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter) and tier 3, while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly confusing. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fixup Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9598 From jwaters at openjdk.org Mon Jul 25 22:50:05 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 25 Jul 2022 22:50:05 GMT Subject: Integrated: 8290834: Improve potentially confusing documentation on collection of profiling information In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 18:36:43 GMT, Julian Waters wrote: > Documentation on the MethodData object incorrectly states that it is used when profiling in tiers 0 and 1, when it only does so for tier 0 (Interpreter) and tier 3, while tier 1 (Fully optimizing C1) does not collect any profile data at all. Additionally, the description for the different execution tiers is slightly confusing. This cleanup attempts to slightly better clarify how profiling is tied together between the Interpreter and C1, explain what MDO is an abbreviation for (MethodData object), and corrects the documentation for MethodData as well. This pull request has now been integrated. Changeset: 0ca5cb13 Author: Julian Waters Committer: Dean Long URL: https://git.openjdk.org/jdk/commit/0ca5cb13a38105a4334ac3508a9c7155fc00cac3 Stats: 16 lines in 3 files changed: 11 ins; 0 del; 5 mod 8290834: Improve potentially confusing documentation on collection of profiling information Reviewed-by: thartmann, kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/9598 From iklam at openjdk.org Mon Jul 25 22:54:15 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 25 Jul 2022 22:54:15 GMT Subject: RFR: 8290840: Refactor the "os" class [v2] In-Reply-To: References: Message-ID: <8vUXDokoOuVschkyCtu5BNpv8ZYIv01RllzKWhdZdXQ=.aa253ce1-7e3b-4efc-9b90-930a2a9b3ea5@github.com> > Please see [JDK-8290840](https://bugs.openjdk.org/browse/JDK-8290840) for the detailed proposal. > > The `os` class, declared in os.hpp, forms the major part of the HotSpot porting interface. Its structure has gradually deteriorated over the years as new ports are created and new APIs are added. > > This RFE tries to address the following: > > - Clearly specify where a porting API should be declared and defined among the various `os*.cpp` and `os*.hpp` files. > - Avoid the inappropriate inclusion of OS-specific APIs (such as the `os::Linux class`) by platform-independent source files. Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - moved os::{print_active_locale, print_user_info} to os_posix.cpp - Fixed os.hpp comments per @dholmes-ora review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9600/files - new: https://git.openjdk.org/jdk/pull/9600/files/5516640c..9f853295 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9600&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9600&range=00-01 Stats: 52 lines in 6 files changed: 3 ins; 31 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/9600.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9600/head:pull/9600 PR: https://git.openjdk.org/jdk/pull/9600 From dholmes at openjdk.org Tue Jul 26 00:03:59 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 26 Jul 2022 00:03:59 GMT Subject: RFR: 8290840: Refactor the "os" class [v2] In-Reply-To: <8vUXDokoOuVschkyCtu5BNpv8ZYIv01RllzKWhdZdXQ=.aa253ce1-7e3b-4efc-9b90-930a2a9b3ea5@github.com> References: <8vUXDokoOuVschkyCtu5BNpv8ZYIv01RllzKWhdZdXQ=.aa253ce1-7e3b-4efc-9b90-930a2a9b3ea5@github.com> Message-ID: On Mon, 25 Jul 2022 22:54:15 GMT, Ioi Lam wrote: >> Please see [JDK-8290840](https://bugs.openjdk.org/browse/JDK-8290840) for the detailed proposal. >> >> The `os` class, declared in os.hpp, forms the major part of the HotSpot porting interface. Its structure has gradually deteriorated over the years as new ports are created and new APIs are added. >> >> This RFE tries to address the following: >> >> - Clearly specify where a porting API should be declared and defined among the various `os*.cpp` and `os*.hpp` files. >> - Avoid the inappropriate inclusion of OS-specific APIs (such as the `os::Linux class`) by platform-independent source files. > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - moved os::{print_active_locale, print_user_info} to os_posix.cpp > - Fixed os.hpp comments per @dholmes-ora review Updates look good thanks. Overall this seems good to me. I think there may be more refinement possible with regards to the os::XXX classes, but that can be future cleanup if you want to proceed with this version for now. Thanks. Mis-clicked. Approved. src/hotspot/os/posix/os_posix.cpp line 573: > 571: ::umask(umsk); > 572: st->print("umask: %04o (", (unsigned) umsk); > 573: os::Posix::print_umask(st, umsk); There's really no reason `print_umask` has to be part of the os::Posix class, it could just be a static file function in os_posix.cpp. I suspect the same may be true for other things in os::Posix - I would think we only need things in os::Posix that are explicitly called from os::Linux or os::Bsd etc. ------------- PR: https://git.openjdk.org/jdk/pull/9600Marked as reviewed by dholmes (Reviewer). From fyang at openjdk.org Tue Jul 26 01:34:06 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 26 Jul 2022 01:34:06 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work RISCV-specific changes looks good to me. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/9576 From kvn at openjdk.org Tue Jul 26 03:12:12 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Jul 2022 03:12:12 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: References: Message-ID: On Sat, 23 Jul 2022 13:18:05 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - rename > - consolidate sse checks > - benchmark > - fix > - Merge branch 'master' into improveReplicate > - remove duplicate > - unsignness > - rematerializing input count > - fix comparison > - fix rematerialize, constant deduplication > - ... and 8 more: https://git.openjdk.org/jdk/compare/0599a05f...6c10f9ad I submitted testing. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jbhateja at openjdk.org Tue Jul 26 05:57:39 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 26 Jul 2022 05:57:39 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: References: Message-ID: On Sat, 23 Jul 2022 13:18:05 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - rename > - consolidate sse checks > - benchmark > - fix > - Merge branch 'master' into improveReplicate > - remove duplicate > - unsignness > - rematerializing input count > - fix comparison > - fix rematerialize, constant deduplication > - ... and 8 more: https://git.openjdk.org/jdk/compare/0599a05f...6c10f9ad src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1662: > 1660: case 64: vmovups(dst, src, Assembler::AVX_512bit); break; > 1661: default: ShouldNotReachHere(); > 1662: } Vector Load/store from memory happens from dedicated ports, can you elaborate why this change will benefit. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4388: > 4386: > 4387: void MacroAssembler::vallones(XMMRegister dst, int vector_len) { > 4388: // vpcmpeqd has special dependency treatment so it should be preferred to vpternlogd Comment is not clear, adding relevant reference will add more value. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From shade at openjdk.org Tue Jul 26 06:06:12 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Jul 2022 06:06:12 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work All right, thank you all, I will be integrating soon then. @TobiHartmann, @vnkozlov -- want to sanity check the C2 refactor? ------------- PR: https://git.openjdk.org/jdk/pull/9576 From shade at openjdk.org Tue Jul 26 07:17:52 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Jul 2022 07:17:52 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: <-V7eK43hiIlGl9L-MPr03OF3Pzw0QfzuyHd2MxU2_FY=.5b404def-5528-4911-a231-ce0886804196@github.com> Message-ID: <_g-3cjOMaECSn09yxtbIIS8NhhFCECHGyx4Nwc7kHlY=.6d487aaf-b49f-406d-af39-e26fc8d9962a@github.com> On Mon, 25 Jul 2022 14:48:50 GMT, Erik ?sterlund wrote: > Sounds good to me. FYI, that would hopefully be in #9636 -- we can remove quite a bit more than just `StorePConditional` :) ------------- PR: https://git.openjdk.org/jdk/pull/9576 From jbhateja at openjdk.org Tue Jul 26 08:07:47 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 26 Jul 2022 08:07:47 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 05:53:31 GMT, Jatin Bhateja wrote: > Vector Load/store from memory happens from dedicated ports, can you elaborate why this change will benefit. Above reference to section 3.5.5.2 also states that FP loads adds another cycle of latency, but saving the cycles penalty due to bypass b/w FP and SIMD domains still holds good. So may be for load there is no pressing need and existing load vector handling can be kept as it is. Overall savings from constant table size reductions are very impressive. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From bulasevich at openjdk.org Tue Jul 26 09:09:44 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 26 Jul 2022 09:09:44 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 11:38:10 GMT, Sergey Nazarkin wrote: > Got assert on arm-server-fastdebug. (Update) master seems affected as well. So this is an unrelated issue. Let me will look into it. [JDK-8291003](https://bugs.openjdk.org/browse/JDK-8291003) ------------- PR: https://git.openjdk.org/jdk/pull/9576 From duke at openjdk.org Tue Jul 26 12:09:16 2022 From: duke at openjdk.org (duke) Date: Tue, 26 Jul 2022 12:09:16 GMT Subject: Withdrawn: 8287550: Improve stack bang sp update granularity In-Reply-To: References: Message-ID: On Tue, 31 May 2022 08:05:22 GMT, ldaxr wrote: > Align sp to page boundary when set to growth watermark in interpreter. > > The performance tests for a JMH case are shown below. > [stack_bang_sp_align.csv](https://github.com/openjdk/jdk/files/8803249/stack_bang_sp_align.csv) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/8951 From jwaters at openjdk.org Tue Jul 26 12:19:34 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 26 Jul 2022 12:19:34 GMT Subject: RFR: 8291002: Rename Method::build_interpreter_method_data to Method::build_profiling_method_data Message-ID: As mentioned in the review process for [JDK-8290834](https://bugs.openjdk.org/browse/JDK-8290834) `build_interpreter_method_data` is misleading because it is actually used for creating MethodData*s throughout HotSpot, not just in the interpreter. Renamed the method to `build_profiling_method_data` instead to more accurately describe what it is used for. ------------- Commit messages: - Rename build_interpreter_method_data -> build_profiling_method_data Changes: https://git.openjdk.org/jdk/pull/9637/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9637&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291002 Stats: 12 lines in 9 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/9637.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9637/head:pull/9637 PR: https://git.openjdk.org/jdk/pull/9637 From duke at openjdk.org Tue Jul 26 12:32:12 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Tue, 26 Jul 2022 12:32:12 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v9] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the generation of broadcasting a scalar in several ways: > > - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. > - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. > - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay > > With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: > > Before After > Benchmark Mode Cnt Score Error Score Error Units Gain > SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% > SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% > SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% > SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% > > As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. > > This patch also removes some redundant code paths and renames some incorrectly named instructions. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7832/files - new: https://git.openjdk.org/jdk/pull/7832/files/6c10f9ad..6ec8519f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=07-08 Stats: 38 lines in 4 files changed: 0 ins; 14 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/7832.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7832/head:pull/7832 PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Tue Jul 26 12:32:14 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Tue, 26 Jul 2022 12:32:14 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v2] In-Reply-To: References: <1FBk3MauXFxUsyHz9kuhqGI-CtLRgHYmHn1eyyaDLvs=.6d4d94b0-32a0-42dc-a181-87df8d8f3b65@github.com> Message-ID: On Wed, 16 Mar 2022 17:25:53 GMT, Jatin Bhateja wrote: >>> Hi, forwarding results within the same bypass domain does not result in delay, data bypass delay happens when the data crosses different domains, according to "Intel? 64 and IA-32 Architectures Optimization Reference Manual" >>> >>> > When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a delay can occur. The delay occurs also for transitions between Intel SSE integer and Intel SSE floating-point operations. In some of the cases, the data transition is done using a micro-op that is added to the instruction flow. >>> >>> The manual mentions the guideline at section 3.5.2.2 >>> >>> ![image](https://user-images.githubusercontent.com/49088128/158618209-c0674ba7-1c93-4014-a7e1-330f4e5846da.png) >>> >>> Thanks. >> >> Thanks meant to refer to above text. I have removed incorrect reference. > >> > Hi, forwarding results within the same bypass domain does not result in delay, data bypass delay happens when the data crosses different domains, according to "Intel? 64 and IA-32 Architectures Optimization Reference Manual" >> > > When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a delay can occur. The delay occurs also for transitions between Intel SSE integer and Intel SSE floating-point operations. In some of the cases, the data transition is done using a micro-op that is added to the instruction flow. >> > >> > >> > The manual mentions the guideline at section 3.5.2.2 >> > ![image](https://user-images.githubusercontent.com/49088128/158618209-c0674ba7-1c93-4014-a7e1-330f4e5846da.png) >> > Thanks. >> >> Thanks meant to refer to above text. I have removed incorrect reference. > > It will still be good if we can come up with a micro benchmark, that shows the gain with the patch. @jatin-bhateja Thanks a lot for your comments, I have addressed them in the last commit ------------- PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Tue Jul 26 12:32:16 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Tue, 26 Jul 2022 12:32:16 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 08:04:55 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1662: >> >>> 1660: case 64: vmovups(dst, src, Assembler::AVX_512bit); break; >>> 1661: default: ShouldNotReachHere(); >>> 1662: } >> >> Vector Load/store from memory happens from dedicated ports, can you elaborate why this change will benefit. > >> Vector Load/store from memory happens from dedicated ports, can you elaborate why this change will benefit. > > Above reference to section 3.5.5.2 also states that FP loads adds another cycle of latency, but saving the cycles penalty due to bypass b/w FP and SIMD domains still holds good. So may be for load there is no pressing need and existing load vector handling can be kept as it is. > > Overall savings from constant table size reductions are very impressive. Thanks. Thanks for your sharing, I have reverted the change here ------------- PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Tue Jul 26 12:52:16 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Tue, 26 Jul 2022 12:52:16 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: References: Message-ID: <0TH2Cv2t4pTvoEZ9c4MLAtMqTWF2_tHYwFq-Z_pmbbQ=.5ab14625-26c7-4cb8-914e-51b3059a69fb@github.com> On Tue, 26 Jul 2022 05:53:26 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: >> >> - rename >> - consolidate sse checks >> - benchmark >> - fix >> - Merge branch 'master' into improveReplicate >> - remove duplicate >> - unsignness >> - rematerializing input count >> - fix comparison >> - fix rematerialize, constant deduplication >> - ... and 8 more: https://git.openjdk.org/jdk/compare/0599a05f...6c10f9ad > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4388: > >> 4386: >> 4387: void MacroAssembler::vallones(XMMRegister dst, int vector_len) { >> 4388: // vpcmpeqd has special dependency treatment so it should be preferred to vpternlogd > > Comment is not clear, adding relevant reference will add more value. I have remeasured the statement, it seems that only the non-vex encoding version receives the special dependency treatment, so I reverted this change and added a comment for clarification. The optimisation can be found noticed in [The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers](https://www.agner.org/optimize/) on several architectures such as in section 9.8 (Register allocation and renaming in Sandy Bridge and Ivy Bridge pipeline). I have performed measurements on uica.uops.info . While this sequence gives 1.37 cycles/iteration on Skylake and Icelake pcmpeqd xmm0, xmm0 paddd xmm0, xmm1 paddd xmm0, xmm1 paddd xmm0, xmm1 This version has the throughput of 4 cycles/iteration vpcmpeqd xmm0, xmm0, xmm0 vpaddd xmm0, xmm1, xmm0 vpaddd xmm0, xmm1, xmm0 vpaddd xmm0, xmm1, xmm0 Which indicates the `vpcmpeqd` failing to break dependencies on `xmm0` as opposed to the `pcmpeqd` instruction. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From thartmann at openjdk.org Tue Jul 26 13:03:26 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 26 Jul 2022 13:03:26 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 16:17:38 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8290706-remove-inline-contig > - Work JIT changes look good to me. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 718: > 716: // prevent a degradation of the optimization. > 717: // See comment in memnode.hpp, around line 227 in class LoadPNode. > 718: Node *tlab_end = macro->make_load(toobig_false, mem, tlab_end_adr, 0, TypeRawPtr::BOTTOM, T_ADDRESS); Suggestion: Node* tlab_end = macro->make_load(toobig_false, mem, tlab_end_adr, 0, TypeRawPtr::BOTTOM, T_ADDRESS); Some more occurences in below changes/code. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/9576 From jvernee at openjdk.org Tue Jul 26 13:21:08 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 26 Jul 2022 13:21:08 GMT Subject: RFR: 8290373: Enable lossy conversion warnings on Windows [v4] In-Reply-To: References: Message-ID: On Mon, 25 Jul 2022 01:15:31 GMT, David Holmes wrote: > > I propose that when a patch touches a file with lossy conversion warnings disabled, it should also enable the warnings and fix them for that file. > > This contradicts other common recomendations of not mixing unrelated issues in the same PR. Fixing style nits is harmless but fixing an actual lossy conversion issue could itself lead to bugs and unexpected fanout. That's a good point. I think that takes piggybacking on other patches off the table then. ------------- PR: https://git.openjdk.org/jdk/pull/9516 From duke at openjdk.org Tue Jul 26 13:26:04 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Tue, 26 Jul 2022 13:26:04 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v10] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the generation of broadcasting a scalar in several ways: > > - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. > - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. > - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay > > With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: > > Before After > Benchmark Mode Cnt Score Error Score Error Units Gain > SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% > SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% > SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% > SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% > > As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. > > This patch also removes some redundant code paths and renames some incorrectly named instructions. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: replI_mem ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7832/files - new: https://git.openjdk.org/jdk/pull/7832/files/6ec8519f..c049d542 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=08-09 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/7832.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7832/head:pull/7832 PR: https://git.openjdk.org/jdk/pull/7832 From shade at openjdk.org Tue Jul 26 14:14:04 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Jul 2022 14:14:04 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v3] In-Reply-To: References: Message-ID: > See the bug for rationale and link to RFC. > > This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux AArch64 fastdebug `tier1` > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) > - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) > - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) > - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) > - [x] Linux RISC-V cross-build (attn @RealFYang) > > Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. > > I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Make sure stars are aligned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9576/files - new: https://git.openjdk.org/jdk/pull/9576/files/dd2f3ea2..6a1c2020 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9576&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9576&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9576.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9576/head:pull/9576 PR: https://git.openjdk.org/jdk/pull/9576 From shade at openjdk.org Tue Jul 26 14:14:08 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Jul 2022 14:14:08 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v2] In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 12:58:28 GMT, Tobias Hartmann wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' into JDK-8290706-remove-inline-contig >> - Work > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 718: > >> 716: // prevent a degradation of the optimization. >> 717: // See comment in memnode.hpp, around line 227 in class LoadPNode. >> 718: Node *tlab_end = macro->make_load(toobig_false, mem, tlab_end_adr, 0, TypeRawPtr::BOTTOM, T_ADDRESS); > > Suggestion: > > Node* tlab_end = macro->make_load(toobig_false, mem, tlab_end_adr, 0, TypeRawPtr::BOTTOM, T_ADDRESS); > > > Some more occurences in below changes/code. Right, I fixed those in new commit. ------------- PR: https://git.openjdk.org/jdk/pull/9576 From duke at openjdk.org Tue Jul 26 14:26:05 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Tue, 26 Jul 2022 14:26:05 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jul 2022 08:15:50 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace trampoline_call1 with trampoline_call > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 637: > >> 635: // code. >> 636: PhaseOutput* phase_output = Compile::current()->output(); >> 637: in_scratch_emit_size = > > Looks reasonable enough. The only change is to check for `Compile::current()->output()` being null, right? Hi @theRealAph, I am sorry I did not get your comment. Could you please explain it? Thanks, Evgeny ------------- PR: https://git.openjdk.org/jdk/pull/9592 From thartmann at openjdk.org Tue Jul 26 14:36:05 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 26 Jul 2022 14:36:05 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v3] In-Reply-To: References: Message-ID: <15K7cW30e9RL2ggnabuThT9STJhAhNToJ1dgbMD5CEg=.3b1d702b-25e7-4a54-92ea-03bf4a0639e7@github.com> On Tue, 26 Jul 2022 14:14:04 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Make sure stars are aligned Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9576 From aph at openjdk.org Tue Jul 26 14:46:05 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 26 Jul 2022 14:46:05 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 14:22:42 GMT, Evgeny Astigeevich wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 637: >> >>> 635: // code. >>> 636: PhaseOutput* phase_output = Compile::current()->output(); >>> 637: in_scratch_emit_size = >> >> Looks reasonable enough. The only change is to check for `Compile::current()->output()` being null, right? > > Hi @theRealAph, > I am sorry I did not get your comment. Could you please explain it? > > Thanks, > Evgeny The addition is 'PhaseOutput* phase_output = Compile::current()->output();' then 'phase_output != NULL && phase_output->in_scratch_emit_size()' so AFAICS `Compile::current()->output()` is now checked for null, where it was not before. ------------- PR: https://git.openjdk.org/jdk/pull/9592 From eliu at openjdk.org Tue Jul 26 15:09:08 2022 From: eliu at openjdk.org (Eric Liu) Date: Tue, 26 Jul 2022 15:09:08 GMT Subject: RFR: 8284990: AArch64: Remove STXR_PREFETCH from CPU features Message-ID: As STXR_PREFETCH is usually done unconditionally in non-JVM code, e.g., Linux ARM Kernel[1], this patch removes VM_Version::CPU_STXR_PREFETCH and generates it unconditionally. [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1436779519-2232-16-git-send-email-will.deacon at arm.com/ [TEST] Full Jtreg passed without new failure. ------------- Commit messages: - 8284990: AArch64: Remove STXR_PREFETCH from CPU features Changes: https://git.openjdk.org/jdk/pull/9641/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9641&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8284990 Stats: 18 lines in 5 files changed: 0 ins; 11 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/9641.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9641/head:pull/9641 PR: https://git.openjdk.org/jdk/pull/9641 From aph at openjdk.org Tue Jul 26 15:18:03 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 26 Jul 2022 15:18:03 GMT Subject: RFR: 8284990: AArch64: Remove STXR_PREFETCH from CPU features In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 14:59:19 GMT, Eric Liu wrote: > As STXR_PREFETCH is usually done unconditionally in non-JVM code, e.g., > Linux ARM Kernel[1], this patch removes VM_Version::CPU_STXR_PREFETCH > and generates it unconditionally. > > [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1436779519-2232-16-git-send-email-will.deacon at arm.com/ > > [TEST] > Full Jtreg passed without new failure. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9641 From ngasson at openjdk.org Tue Jul 26 15:47:10 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Tue, 26 Jul 2022 15:47:10 GMT Subject: RFR: 8284990: AArch64: Remove STXR_PREFETCH from CPU features In-Reply-To: References: Message-ID: <6Z0SMSdaXgo7ra8a-EgErE3TxMg2WElP1deVS768gTs=.d88140aa-b627-41d2-8970-cd0121a896b7@github.com> On Tue, 26 Jul 2022 14:59:19 GMT, Eric Liu wrote: > As STXR_PREFETCH is usually done unconditionally in non-JVM code, e.g., > Linux ARM Kernel[1], this patch removes VM_Version::CPU_STXR_PREFETCH > and generates it unconditionally. > > [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1436779519-2232-16-git-send-email-will.deacon at arm.com/ > > [TEST] > Full Jtreg passed without new failure. Looks good, thanks for doing this clean-up. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.org/jdk/pull/9641 From tschatzl at openjdk.org Tue Jul 26 16:30:50 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 26 Jul 2022 16:30:50 GMT Subject: RFR: 8290074: Remove implicit arguments for RegisterMap constructor In-Reply-To: References: Message-ID: <8txeHYkmwTumyrOClXP67vNUaCQYzDgiI2qIG-Qk3Jg=.6cc0c63b-b85c-4ca3-b418-e80ba2b1ea38@github.com> On Mon, 11 Jul 2022 14:58:07 GMT, Axel Boldt-Christmas wrote: > Currently the `RegisterMap` constructor uses implicit boolean arguments to configure its function. Implicit boolean arguments makes code harder to understand and reason about at the call site. Using explicit scoped enums instead makes it both clear what is being configured and the type safety makes mistakes less likely. > > Update `RegisterMap` constructors to use these scoped enum types instead of booleans. > ```C++ > enum class UpdateMap { skip, yes }; > enum class ProcessFrames { skip, yes }; > enum class WalkContinuation { skip, yes }; > > > Testing: tier1-3 Feel free to ignore my comment, seems good otherwise. src/hotspot/share/runtime/registerMap.hpp line 75: > 73: enum class UpdateMap { skip, yes }; > 74: enum class ProcessFrames { skip, yes }; > 75: enum class WalkContinuation { skip, yes }; Instead of `yes` I would recommend using like `include` (or `add') as `yes` seems relatively unspecific compared to `skip`. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/9455 From kvn at openjdk.org Tue Jul 26 16:34:47 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Jul 2022 16:34:47 GMT Subject: RFR: 8291002: Rename Method::build_interpreter_method_data to Method::build_profiling_method_data In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 08:45:59 GMT, Julian Waters wrote: > As mentioned in the review process for [JDK-8290834](https://bugs.openjdk.org/browse/JDK-8290834) `build_interpreter_method_data` is misleading because it is actually used for creating MethodData*s throughout HotSpot, not just in the interpreter. Renamed the method to `build_profiling_method_data` instead to more accurately describe what it is used for. Good. Originally only Interpreter collected profiling data. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9637 From kvn at openjdk.org Tue Jul 26 16:43:03 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Jul 2022 16:43:03 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v10] In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 13:26:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > replI_mem The testing of version 07 got failure when run vector tests with `-XX:UseAVX=0 -XX:UseSSE=2`: # Internal Error (/workspace/open/src/hotspot/share/opto/constantTable.cpp:217), pid=2750036, tid=2750067 # assert((constant_addr - _masm.code()->consts()->start()) == con.offset()) failed: must be: 8 == 0 Current CompileTask: C2: 287 29 % b compiler.codegen.TestByteVect::test_ci @ 2 (20 bytes) Stack: [0x00007f7abf144000,0x00007f7abf245000], sp=0x00007f7abf23fa30, free space=1006k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xb731c8] ConstantTable::emit(CodeBuffer&) const+0x1c8 V [libjvm.so+0x17c3673] PhaseOutput::fill_buffer(CodeBuffer*, unsigned int*)+0x293 V [libjvm.so+0xb191bb] Compile::Code_Gen()+0x42b V [libjvm.so+0xb1e899] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1699 and # Internal Error (/workspace/open/src/hotspot/cpu/x86/assembler_x86.cpp:5095), pid=1431469, tid=1431493 # Error: assert(VM_Version::supports_ssse3()) failed Current CompileTask: C2: 468 240 % b 4 java.util.Arrays::fill @ 5 (21 bytes) Stack: [0x00007fdecd422000,0x00007fdecd523000], sp=0x00007fdecd51d8c0, free space=1006k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x73079c] Assembler::pshufb(XMMRegisterImpl*, XMMRegisterImpl*)+0x13c V [libjvm.so+0x4005d1] ReplB_regNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x1a1 V [libjvm.so+0x17be04e] PhaseOutput::scratch_emit_size(Node const*)+0x45e V [libjvm.so+0x17b4548] PhaseOutput::shorten_branches(unsigned int*)+0x2d8 V [libjvm.so+0x17c6faa] PhaseOutput::Output()+0xcfa V [libjvm.so+0xb191bb] Compile::Code_Gen()+0x42b V [libjvm.so+0xb1e899] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1699 ------------- PR: https://git.openjdk.org/jdk/pull/7832 From shade at openjdk.org Tue Jul 26 17:23:05 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Jul 2022 17:23:05 GMT Subject: RFR: 8290706: Remove the support for inline contiguous allocations [v3] In-Reply-To: References: Message-ID: <8fLUFm0XdOuQeGWnS9o7U0LjPkBId-ZQZgvdI6z5CWk=.a634355f-4e5a-4529-9aee-3c00a5588323@github.com> On Tue, 26 Jul 2022 14:14:04 GMT, Aleksey Shipilev wrote: >> See the bug for rationale and link to RFC. >> >> This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Zero build >> - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) >> - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) >> - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) >> - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) >> - [x] Linux RISC-V cross-build (attn @RealFYang) >> >> Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. >> >> I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Make sure stars are aligned All right, thank you all. I am integrating and dealing with the rest of the issues, if any, later. ------------- PR: https://git.openjdk.org/jdk/pull/9576 From shade at openjdk.org Tue Jul 26 17:23:07 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Jul 2022 17:23:07 GMT Subject: Integrated: 8290706: Remove the support for inline contiguous allocations In-Reply-To: References: Message-ID: On Wed, 20 Jul 2022 17:39:28 GMT, Aleksey Shipilev wrote: > See the bug for rationale and link to RFC. > > This removes the 3rd allocation path (first two being TLAB and native GC interface), that is used by Serial/Parallel when TLABs are not available. There is little sense in keeping this code, especially since it requires supporting a bunch of platform-specific assembly. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux AArch64 fastdebug `tier1` > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build (attn @theRealAph, @adinn) > - [x] Linux ARM cross-build (attn @bulasevich, @snazarkin) > - [x] Linux S390X cross-build (attn @backwaterred, @RealLucy) > - [x] Linux PPC64 cross-build (attn @TheRealMDoerr, @reinrich) > - [x] Linux RISC-V cross-build (attn @RealFYang) > > Apart from x86 and AArch64, I only verified the cross-compilation builds pass, no other testing is done. > > I did not touch the JVMCI interfaces, since I am not sure what is the proper protocol for JVMCI changes. This pull request has now been integrated. Changeset: 8159a1ab Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/8159a1ab708d7571814bbb1a60893b4a7379a082 Stats: 1009 lines in 45 files changed: 3 ins; 961 del; 45 mod 8290706: Remove the support for inline contiguous allocations Reviewed-by: eosterlund, aph, rrich, fyang, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/9576 From shade at openjdk.org Tue Jul 26 19:00:26 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Jul 2022 19:00:26 GMT Subject: RFR: 8291000: C2: Purge LoadPLocked and Store*Conditional nodes Message-ID: The last uses for these nodes was the inline contiguous allocations. With [JDK-8290706](https://bugs.openjdk.org/browse/JDK-8290706), these nodes are not used anymore and can be cleaned up. Testing: - [x] Linux x86_64 fastdebug tier1 - [ ] Linux x86_32 fastdebug tier1 - [ ] Linux AArch64 fastdebug tier1 - [x] Linux x86_64 Zero build - [x] Linux AArch64 cross-build - [x] Linux ARM cross-build - [x] Linux S390X cross-build - [x] Linux PPC64 cross-build - [x] Linux RISC-V cross-build ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/9636/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9636&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291000 Stats: 556 lines in 17 files changed: 0 ins; 552 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9636.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9636/head:pull/9636 PR: https://git.openjdk.org/jdk/pull/9636 From eosterlund at openjdk.org Tue Jul 26 19:15:30 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 26 Jul 2022 19:15:30 GMT Subject: RFR: 8291000: C2: Purge LoadPLocked and Store*Conditional nodes In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 07:13:01 GMT, Aleksey Shipilev wrote: > The last uses for these nodes was the inline contiguous allocations. With [JDK-8290706](https://bugs.openjdk.org/browse/JDK-8290706), these nodes are not used anymore and can be cleaned up. > > Testing: > - [x] Linux x86_64 fastdebug tier1 > - [ ] Linux x86_32 fastdebug tier1 > - [ ] Linux AArch64 fastdebug tier1 > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build > - [x] Linux ARM cross-build > - [x] Linux S390X cross-build > - [x] Linux PPC64 cross-build > - [x] Linux RISC-V cross-build Nice change. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/9636 From kvn at openjdk.org Tue Jul 26 20:03:17 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Jul 2022 20:03:17 GMT Subject: RFR: 8291000: C2: Purge LoadPLocked and Store*Conditional nodes In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 07:13:01 GMT, Aleksey Shipilev wrote: > The last uses for these nodes was the inline contiguous allocations. With [JDK-8290706](https://bugs.openjdk.org/browse/JDK-8290706), these nodes are not used anymore and can be cleaned up. > > Testing: > - [x] Linux x86_64 fastdebug tier1 > - [x] Linux x86_32 fastdebug tier1 > - [ ] Linux AArch64 fastdebug tier1 > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build > - [x] Linux ARM cross-build > - [x] Linux S390X cross-build > - [x] Linux PPC64 cross-build > - [x] Linux RISC-V cross-build Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9636 From sangheki at openjdk.org Tue Jul 26 20:30:29 2022 From: sangheki at openjdk.org (Sangheon Kim) Date: Tue, 26 Jul 2022 20:30:29 GMT Subject: RFR: 8289137: Automatically adapt Young/OldPLABSize and when setting only MinTLABSize [v3] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 09:27:00 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this enhancement fixing a sometimes annoying UI issue where setting `-XX:MinTLABSize` does not automatically update `-XX:YoungPLABSize` and `-XX:OldPLABSize` if they are not set? >> This avoids some unnecessary retries. >> >> Testing: gha, test case >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review Looks good. ------------- Marked as reviewed by sangheki (Reviewer). PR: https://git.openjdk.org/jdk/pull/9425 From lzhai at openjdk.org Wed Jul 27 01:25:15 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 01:25:15 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant due to it can not be modified Message-ID: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Hi @stefank When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: const size_t ZPlatformGranuleSizeShift = 24; // 16MB // Granule shift/size const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; // Page size shifts const size_t ZPageSizeSmallShift = ZGranuleSizeShift; // Page sizes const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; `zBitField` failed to work: Internal Error (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value # # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) Perhaps `mask` also need to be adjusted: static size_t object_index(oop obj) { const uintptr_t addr = ZOop::to_address(obj); const uintptr_t offset = ZAddress::offset(addr); const uintptr_t mask = ZGranuleSize - 1; return (offset & mask) >> ZObjectAlignmentSmallShift; } Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. Please review my patch. Thanks, Leslie Zhai ------------- Commit messages: - 8291106: ZPlatformGranuleSizeShift is redundant due to it can not be modified - ZPlatformGranuleSizeShift is redundant due to it can not be modified Changes: https://git.openjdk.org/jdk/pull/9582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291106 Stats: 20 lines in 6 files changed: 15 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9582/head:pull/9582 PR: https://git.openjdk.org/jdk/pull/9582 From eosterlund at openjdk.org Wed Jul 27 07:11:09 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 27 Jul 2022 07:11:09 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant due to it can not be modified In-Reply-To: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: On Thu, 21 Jul 2022 02:17:33 GMT, Leslie Zhai wrote: > Hi, > > When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: > > > const size_t ZPlatformGranuleSizeShift = 24; // 16MB > > // Granule shift/size > const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; > > // Page size shifts > const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > > // Page sizes > const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; > > > `zBitField` failed to work: > > > Internal Error > (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 > # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) > > > Perhaps `mask` also need to be adjusted: > > > static size_t object_index(oop obj) { > const uintptr_t addr = ZOop::to_address(obj); > const uintptr_t offset = ZAddress::offset(addr); > const uintptr_t mask = ZGranuleSize - 1; > return (offset & mask) >> ZObjectAlignmentSmallShift; > } > > > Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. > > Please review my patch. > > Thanks, > Leslie Zhai I have many comments. But before diving into that - what is the actual problem that you are trying to solve? Do you have an AArch64 CPU that does not support 2M large pages, but does support 16M large pages? ------------- PR: https://git.openjdk.org/jdk/pull/9582 From lzhai at openjdk.org Wed Jul 27 07:39:46 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 07:39:46 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant due to it can not be modified In-Reply-To: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: On Thu, 21 Jul 2022 02:17:33 GMT, Leslie Zhai wrote: > Hi, > > When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: > > > const size_t ZPlatformGranuleSizeShift = 24; // 16MB > > // Granule shift/size > const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; > > // Page size shifts > const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > > // Page sizes > const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; > > > `zBitField` failed to work: > > > Internal Error > (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 > # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) > > > Perhaps `mask` also need to be adjusted: > > > static size_t object_index(oop obj) { > const uintptr_t addr = ZOop::to_address(obj); > const uintptr_t offset = ZAddress::offset(addr); > const uintptr_t mask = ZGranuleSize - 1; > return (offset & mask) >> ZObjectAlignmentSmallShift; > } > > > Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. > > Please review my patch. > > Thanks, > Leslie Zhai Hi Erik, Thanks for your response! > Do you have an AArch64 CPU that does not support 2M large pages, but does support 16M large pages? Yup: ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java -Xlog:gc*=debug -XX:+UseZGC -version [0.007s][debug][gc,heap] Minimum heap 16777216 Initial heap 2147483648 Maximum heap 32178700288 [0.007s][info ][gc,init] Initializing The Z Garbage Collector [0.007s][info ][gc,init] Version: 20-internal-adhoc.root.jdk (fastdebug) [0.007s][info ][gc,init] Probing address space for the highest valid bit: 47 [0.007s][info ][gc,init] NUMA Support: Enabled [0.007s][info ][gc,init] NUMA Nodes: 4 [0.007s][info ][gc,init] CPUs: 96 total, 96 available [0.007s][info ][gc,init] Memory: 292969M [0.007s][info ][gc,init] Large Page Support: Disabled [0.007s][info ][gc,init] GC Workers: 24 (dynamic) [0.009s][info ][gc,init] Address Space Type: Contiguous/Unrestricted/Complete [0.009s][info ][gc,init] Address Space Size: 491008M x 3 = 1473024M [0.009s][info ][gc,init] Heap Backing File: /memfd:java_heap [0.009s][info ][gc,init] Heap Backing Filesystem: tmpfs (0x1021994) [0.009s][info ][gc,init] Min Capacity: 16M [0.009s][info ][gc,init] Initial Capacity: 2048M [0.009s][info ][gc,init] Max Capacity: 30688M [0.009s][info ][gc,init] Medium Page Size: 256M [0.009s][info ][gc,init] Pre-touch: Disabled [0.009s][info ][gc,init] Available space on backing filesystem: N/A [0.009s][info ][gc,init] Uncommit: Enabled [0.009s][info ][gc,init] Uncommit Delay: 300s [0.036s][debug][gc,marking] Expanding mark stack space: 0M->32M [0.036s][info ][gc,init ] Runtime Workers: 38 [0.039s][info ][gc ] Using The Z Garbage Collector [0.039s][info ][gc,metaspace] CDS archive(s) mapped at: [0x0000000800000000-0x0000000800cf0000-0x0000000800cf0000), size 13565952, SharedBaseAddress: 0x0000000800000000, ArchiveRelocationMode: 0. [0.039s][info ][gc,metaspace] Compressed class space mapped at: 0x0000000801000000-0x0000000841000000, reserved size: 1073741824 [0.039s][info ][gc,metaspace] Narrow klass base: 0x0000000800000000, Narrow klass shift: 0, Narrow klass range: 0x100000000 [0.088s][debug][gc,heap ] Uncommit Timeout: 301s [0.094s][debug][gc,nmethod ] Rebuilding NMethod Table: 0->1024 entries, 0(0%->0%) registered, 0(0%->0%) unregistered openjdk version "20-internal" 2023-03-21 OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.root.jdk) OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.root.jdk, mixed mode, sharing) [0.132s][info ][gc,heap,exit] Heap [0.132s][info ][gc,heap,exit] ZHeap used 16M, capacity 2048M, max capacity 30688M [0.132s][info ][gc,heap,exit] Metaspace used 130K, committed 256K, reserved 1114112K [0.132s][info ][gc,heap,exit] class space used 2K, committed 64K, reserved 1048576K Created file descriptor successfully, see goto `Heap Backing File` line indicated from above xlog: int ZPhysicalMemoryBacking::create_mem_fd(const char* name) const { #ifdef AARCH64 assert(ZGranuleSize == 16 * M, "Granule size must match MFD_HUGE_16MB"); #else assert(ZGranuleSize == 2 * M, "Granule size must match MFD_HUGE_2MB"); #endif // Create file name char filename[PATH_MAX]; snprintf(filename, sizeof(filename), "%s%s", name, ZLargePages::is_explicit() ? ".hugetlb" : ""); // Create file #ifdef AARCH64 const int extra_flags = ZLargePages::is_explicit() ? (MFD_HUGETLB | MFD_HUGE_16MB) : 0; #else const int extra_flags = ZLargePages::is_explicit() ? (MFD_HUGETLB | MFD_HUGE_2MB) : 0; #endif const int fd = ZSyscall::memfd_create(filename, MFD_CLOEXEC | extra_flags); if (fd == -1) { ZErrno err; log_debug_p(gc, init)("Failed to create memfd file (%s)", (ZLargePages::is_explicit() && (err == EINVAL || err == ENODEV)) ? "Hugepages (2M) not available" : err.to_string()); return -1; } => log_info_p(gc, init)("Heap Backing File: /memfd:%s", filename); return fd; } > what is the actual problem that you are trying to solve? Back to the topic: I just want to remove the `ZPlatformGranuleSizeShift` from cpu backends. And I will update the patch about AArch64 debugging part. Thanks, Leslie Zhai ------------- PR: https://git.openjdk.org/jdk/pull/9582 From duke at openjdk.org Wed Jul 27 07:53:10 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Wed, 27 Jul 2022 07:53:10 GMT Subject: RFR: 8290074: Remove implicit arguments for RegisterMap constructor In-Reply-To: <8txeHYkmwTumyrOClXP67vNUaCQYzDgiI2qIG-Qk3Jg=.6cc0c63b-b85c-4ca3-b418-e80ba2b1ea38@github.com> References: <8txeHYkmwTumyrOClXP67vNUaCQYzDgiI2qIG-Qk3Jg=.6cc0c63b-b85c-4ca3-b418-e80ba2b1ea38@github.com> Message-ID: On Tue, 26 Jul 2022 16:25:04 GMT, Thomas Schatzl wrote: >> Currently the `RegisterMap` constructor uses implicit boolean arguments to configure its function. Implicit boolean arguments makes code harder to understand and reason about at the call site. Using explicit scoped enums instead makes it both clear what is being configured and the type safety makes mistakes less likely. >> >> Update `RegisterMap` constructors to use these scoped enum types instead of booleans. >> ```C++ >> enum class UpdateMap { skip, yes }; >> enum class ProcessFrames { skip, yes }; >> enum class WalkContinuation { skip, yes }; >> >> >> Testing: tier1-3 > > src/hotspot/share/runtime/registerMap.hpp line 75: > >> 73: enum class UpdateMap { skip, yes }; >> 74: enum class ProcessFrames { skip, yes }; >> 75: enum class WalkContinuation { skip, yes }; > > Instead of `yes` I would recommend using like `include` (or `add') as `yes` seems relatively unspecific compared to `skip`. `include` is probably better. I was trying to think of a good word to use, first I thought of `do` and `skip` because the enums are actions/verbs. But the reverse order (as in `UpdateMap :: do`) sounds a bit weird. Thought of it more as a question instead, `Action? :: yes/skip`. But `include` fits better with respects to `skip`, `yes`/`no` is an alternative. But `skip`/`include` is nicer. Also an excuse to fix that double space on line 75. ------------- PR: https://git.openjdk.org/jdk/pull/9455 From tschatzl at openjdk.org Wed Jul 27 08:04:30 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 27 Jul 2022 08:04:30 GMT Subject: RFR: 8289137: Automatically adapt Young/OldPLABSize and when setting only MinTLABSize [v3] In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 11:03:26 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review > > Lgtm! Thanks @walulyai @sangheon for your reviews ------------- PR: https://git.openjdk.org/jdk/pull/9425 From tschatzl at openjdk.org Wed Jul 27 08:04:31 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 27 Jul 2022 08:04:31 GMT Subject: Integrated: 8289137: Automatically adapt Young/OldPLABSize and when setting only MinTLABSize In-Reply-To: References: Message-ID: On Fri, 8 Jul 2022 08:39:21 GMT, Thomas Schatzl wrote: > Hi all, > > can I get reviews for this enhancement fixing a sometimes annoying UI issue where setting `-XX:MinTLABSize` does not automatically update `-XX:YoungPLABSize` and `-XX:OldPLABSize` if they are not set? > This avoids some unnecessary retries. > > Testing: gha, test case > > Thanks, > Thomas This pull request has now been integrated. Changeset: 2a1d9cfe Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/2a1d9cfeada7e98cbb679b1bf88a91b722471517 Stats: 94 lines in 4 files changed: 93 ins; 0 del; 1 mod 8289137: Automatically adapt Young/OldPLABSize and when setting only MinTLABSize Reviewed-by: iwalulya, sangheki ------------- PR: https://git.openjdk.org/jdk/pull/9425 From eosterlund at openjdk.org Wed Jul 27 08:13:02 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 27 Jul 2022 08:13:02 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant due to it can not be modified In-Reply-To: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: <2hCDQK-6DoA9zsSvHR_UyvjOI2PcBeWy6nxeSrO7JdM=.01092caf-58a0-4853-a111-90aa53e89ca8@github.com> On Thu, 21 Jul 2022 02:17:33 GMT, Leslie Zhai wrote: > Hi, > > When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: > > > const size_t ZPlatformGranuleSizeShift = 24; // 16MB > > // Granule shift/size > const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; > > // Page size shifts > const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > > // Page sizes > const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; > > > `zBitField` failed to work: > > > Internal Error > (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 > # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) > > > Perhaps `mask` also need to be adjusted: > > > static size_t object_index(oop obj) { > const uintptr_t addr = ZOop::to_address(obj); > const uintptr_t offset = ZAddress::offset(addr); > const uintptr_t mask = ZGranuleSize - 1; > return (offset & mask) >> ZObjectAlignmentSmallShift; > } > > > Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. > > Please review my patch. > > Thanks, > Leslie Zhai Okay please update your patch so I can see what changes you are actually proposing to integrate. ------------- PR: https://git.openjdk.org/jdk/pull/9582 From njian at openjdk.org Wed Jul 27 08:23:05 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Wed, 27 Jul 2022 08:23:05 GMT Subject: RFR: 8284990: AArch64: Remove STXR_PREFETCH from CPU features In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 14:59:19 GMT, Eric Liu wrote: > As STXR_PREFETCH is usually done unconditionally in non-JVM code, e.g., > Linux ARM Kernel[1], this patch removes VM_Version::CPU_STXR_PREFETCH > and generates it unconditionally. > > [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1436779519-2232-16-git-send-email-will.deacon at arm.com/ > > [TEST] > Full Jtreg passed without new failure. Marked as reviewed by njian (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/9641 From lzhai at openjdk.org Wed Jul 27 08:33:04 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 08:33:04 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant due to it can not be modified [v2] In-Reply-To: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: <30CT6vngfr7KMc4jwt_F51C9_BzK9eRigAFjVOBpbMk=.797ec2fa-0cb6-4c46-b8d8-b6873b590ab3@github.com> > Hi, > > When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: > > > const size_t ZPlatformGranuleSizeShift = 24; // 16MB > > // Granule shift/size > const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; > > // Page size shifts > const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > > // Page sizes > const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; > > > `zBitField` failed to work: > > > Internal Error > (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 > # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) > > > Perhaps `mask` also need to be adjusted: > > > static size_t object_index(oop obj) { > const uintptr_t addr = ZOop::to_address(obj); > const uintptr_t offset = ZAddress::offset(addr); > const uintptr_t mask = ZGranuleSize - 1; > return (offset & mask) >> ZObjectAlignmentSmallShift; > } > > > Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. > > Please review my patch. > > Thanks, > Leslie Zhai Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: 8291106: ZPlatformGranuleSizeShift is redundant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9582/files - new: https://git.openjdk.org/jdk/pull/9582/files/66c4e70a..78085135 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=00-01 Stats: 16 lines in 3 files changed: 0 ins; 16 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9582/head:pull/9582 PR: https://git.openjdk.org/jdk/pull/9582 From lzhai at openjdk.org Wed Jul 27 08:37:31 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 08:37:31 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant due to it can not be modified [v3] In-Reply-To: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> > Hi, > > When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: > > > const size_t ZPlatformGranuleSizeShift = 24; // 16MB > > // Granule shift/size > const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; > > // Page size shifts > const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > > // Page sizes > const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; > > > `zBitField` failed to work: > > > Internal Error > (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 > # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) > > > Perhaps `mask` also need to be adjusted: > > > static size_t object_index(oop obj) { > const uintptr_t addr = ZOop::to_address(obj); > const uintptr_t offset = ZAddress::offset(addr); > const uintptr_t mask = ZGranuleSize - 1; > return (offset & mask) >> ZObjectAlignmentSmallShift; > } > > > Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. > > Please review my patch. > > Thanks, > Leslie Zhai Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: 8291106: ZPlatformGranuleSizeShift is redundant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9582/files - new: https://git.openjdk.org/jdk/pull/9582/files/78085135..cf6032ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9582/head:pull/9582 PR: https://git.openjdk.org/jdk/pull/9582 From eosterlund at openjdk.org Wed Jul 27 08:44:17 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 27 Jul 2022 08:44:17 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v3] In-Reply-To: <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> Message-ID: On Wed, 27 Jul 2022 08:37:31 GMT, Leslie Zhai wrote: >> Hi, >> >> When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: >> >> >> const size_t ZPlatformGranuleSizeShift = 24; // 16MB >> >> // Granule shift/size >> const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; >> >> // Page size shifts >> const size_t ZPageSizeSmallShift = ZGranuleSizeShift; >> >> // Page sizes >> const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; >> >> >> `zBitField` failed to work: >> >> >> Internal Error >> (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 >> # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) >> >> >> Perhaps `mask` also need to be adjusted: >> >> >> static size_t object_index(oop obj) { >> const uintptr_t addr = ZOop::to_address(obj); >> const uintptr_t offset = ZAddress::offset(addr); >> const uintptr_t mask = ZGranuleSize - 1; >> return (offset & mask) >> ZObjectAlignmentSmallShift; >> } >> >> >> Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. >> >> Please review my patch. >> >> Thanks, >> Leslie Zhai > > Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: > > 8291106: ZPlatformGranuleSizeShift is redundant Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/9582 From lzhai at openjdk.org Wed Jul 27 08:44:17 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 08:44:17 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v3] In-Reply-To: <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> Message-ID: On Wed, 27 Jul 2022 08:37:31 GMT, Leslie Zhai wrote: >> Hi, >> >> When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: >> >> >> const size_t ZPlatformGranuleSizeShift = 24; // 16MB >> >> // Granule shift/size >> const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; >> >> // Page size shifts >> const size_t ZPageSizeSmallShift = ZGranuleSizeShift; >> >> // Page sizes >> const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; >> >> >> `zBitField` failed to work: >> >> >> Internal Error >> (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 >> # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) >> >> >> Perhaps `mask` also need to be adjusted: >> >> >> static size_t object_index(oop obj) { >> const uintptr_t addr = ZOop::to_address(obj); >> const uintptr_t offset = ZAddress::offset(addr); >> const uintptr_t mask = ZGranuleSize - 1; >> return (offset & mask) >> ZObjectAlignmentSmallShift; >> } >> >> >> Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. >> >> Please review my patch. >> >> Thanks, >> Leslie Zhai > > Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: > > 8291106: ZPlatformGranuleSizeShift is redundant Hi Erik, Please review it again. Thanks, Leslie Zhai Hi Erik, Thanks for your review. Cheers, Leslie Zhai ------------- PR: https://git.openjdk.org/jdk/pull/9582 From jiefu at openjdk.org Wed Jul 27 08:52:04 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 27 Jul 2022 08:52:04 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v3] In-Reply-To: <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> Message-ID: <-K71YW4d7w0lqmFkkFp-4AVgnXtPgMahpnIVVHqM4pA=.1b14f89b-a38e-43e3-b5dc-b013aed5998e@github.com> On Wed, 27 Jul 2022 08:37:31 GMT, Leslie Zhai wrote: >> Hi, >> >> When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: >> >> >> const size_t ZPlatformGranuleSizeShift = 24; // 16MB >> >> // Granule shift/size >> const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; >> >> // Page size shifts >> const size_t ZPageSizeSmallShift = ZGranuleSizeShift; >> >> // Page sizes >> const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; >> >> >> `zBitField` failed to work: >> >> >> Internal Error >> (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 >> # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) >> >> >> Perhaps `mask` also need to be adjusted: >> >> >> static size_t object_index(oop obj) { >> const uintptr_t addr = ZOop::to_address(obj); >> const uintptr_t offset = ZAddress::offset(addr); >> const uintptr_t mask = ZGranuleSize - 1; >> return (offset & mask) >> ZObjectAlignmentSmallShift; >> } >> >> >> Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. >> >> Please review my patch. >> >> Thanks, >> Leslie Zhai > > Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: > > 8291106: ZPlatformGranuleSizeShift is redundant Maybe, we should also update the copyright year? ------------- PR: https://git.openjdk.org/jdk/pull/9582 From lzhai at openjdk.org Wed Jul 27 08:55:02 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 08:55:02 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v4] In-Reply-To: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: > Hi, > > When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: > > > const size_t ZPlatformGranuleSizeShift = 24; // 16MB > > // Granule shift/size > const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; > > // Page size shifts > const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > > // Page sizes > const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; > > > `zBitField` failed to work: > > > Internal Error > (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 > # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) > > > Perhaps `mask` also need to be adjusted: > > > static size_t object_index(oop obj) { > const uintptr_t addr = ZOop::to_address(obj); > const uintptr_t offset = ZAddress::offset(addr); > const uintptr_t mask = ZGranuleSize - 1; > return (offset & mask) >> ZObjectAlignmentSmallShift; > } > > > Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. > > Please review my patch. > > Thanks, > Leslie Zhai Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: 8291106: ZPlatformGranuleSizeShift is redundant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9582/files - new: https://git.openjdk.org/jdk/pull/9582/files/cf6032ce..62b3f871 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=02-03 Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/9582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9582/head:pull/9582 PR: https://git.openjdk.org/jdk/pull/9582 From lzhai at openjdk.org Wed Jul 27 08:56:14 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 08:56:14 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v3] In-Reply-To: <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> <5Epoo-68w3FLpR4I8JqhHa9yKWWqccjW7po8vc6P8Zw=.9ba1ac30-d97f-4c50-b1a1-5da716cc6216@github.com> Message-ID: On Wed, 27 Jul 2022 08:37:31 GMT, Leslie Zhai wrote: >> Hi, >> >> When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: >> >> >> const size_t ZPlatformGranuleSizeShift = 24; // 16MB >> >> // Granule shift/size >> const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; >> >> // Page size shifts >> const size_t ZPageSizeSmallShift = ZGranuleSizeShift; >> >> // Page sizes >> const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; >> >> >> `zBitField` failed to work: >> >> >> Internal Error >> (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 >> # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) >> >> >> Perhaps `mask` also need to be adjusted: >> >> >> static size_t object_index(oop obj) { >> const uintptr_t addr = ZOop::to_address(obj); >> const uintptr_t offset = ZAddress::offset(addr); >> const uintptr_t mask = ZGranuleSize - 1; >> return (offset & mask) >> ZObjectAlignmentSmallShift; >> } >> >> >> Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. >> >> Please review my patch. >> >> Thanks, >> Leslie Zhai > > Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: > > 8291106: ZPlatformGranuleSizeShift is redundant Hi Jie Fu, Thanks for your hint! Please review it again. Miss you, Leslie Zhai ------------- PR: https://git.openjdk.org/jdk/pull/9582 From duke at openjdk.org Wed Jul 27 09:06:10 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Wed, 27 Jul 2022 09:06:10 GMT Subject: RFR: 8290062: Remove nmethodLocker for nmethods on-stack In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 07:49:09 GMT, Axel Boldt-Christmas wrote: > From JBS: > >> The nmethodLocker is pretty nasty as it prevents an nmethod from being freed, but without really keeping it alive. We would like to minimize its use. The most obvious places where it can be removed, is when "protecting" nmethods that are already on-stack. Neither the sweeper nor the GC is interested in making nmethods on-stack not live. These ones simply do not do anything. > > Removed the `nmethodLocker` where the nmethod is a caller on the stack. > > Testing: tier1-7 Tier 1-7 tests have been run. ------------- PR: https://git.openjdk.org/jdk/pull/9444 From jiefu at openjdk.org Wed Jul 27 09:09:36 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 27 Jul 2022 09:09:36 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v4] In-Reply-To: References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: On Wed, 27 Jul 2022 08:55:02 GMT, Leslie Zhai wrote: >> Hi, >> >> When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: >> >> >> const size_t ZPlatformGranuleSizeShift = 24; // 16MB >> >> // Granule shift/size >> const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; >> >> // Page size shifts >> const size_t ZPageSizeSmallShift = ZGranuleSizeShift; >> >> // Page sizes >> const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; >> >> >> `zBitField` failed to work: >> >> >> Internal Error >> (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 >> # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) >> >> >> Perhaps `mask` also need to be adjusted: >> >> >> static size_t object_index(oop obj) { >> const uintptr_t addr = ZOop::to_address(obj); >> const uintptr_t offset = ZAddress::offset(addr); >> const uintptr_t mask = ZGranuleSize - 1; >> return (offset & mask) >> ZObjectAlignmentSmallShift; >> } >> >> >> Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. >> >> Please review my patch. >> >> Thanks, >> Leslie Zhai > > Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: > > 8291106: ZPlatformGranuleSizeShift is redundant src/hotspot/cpu/riscv/gc/z/zGlobals_riscv.hpp line 2: > 1: /* > 2: * Copyright (c) 2015, 2019, Oracle and/or its affiliates. All rights reserved. I think we'd better modify the copyright year of Oracle's. ------------- PR: https://git.openjdk.org/jdk/pull/9582 From lzhai at openjdk.org Wed Jul 27 09:18:12 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 09:18:12 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v4] In-Reply-To: References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: On Wed, 27 Jul 2022 09:07:01 GMT, Jie Fu wrote: >> Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: >> >> 8291106: ZPlatformGranuleSizeShift is redundant > > src/hotspot/cpu/riscv/gc/z/zGlobals_riscv.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2015, 2019, Oracle and/or its affiliates. All rights reserved. > > I think we'd better modify the copyright year of Oracle's. Done. ------------- PR: https://git.openjdk.org/jdk/pull/9582 From lzhai at openjdk.org Wed Jul 27 09:18:09 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Wed, 27 Jul 2022 09:18:09 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v5] In-Reply-To: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: > Hi, > > When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: > > > const size_t ZPlatformGranuleSizeShift = 24; // 16MB > > // Granule shift/size > const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; > > // Page size shifts > const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > > // Page sizes > const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; > > > `zBitField` failed to work: > > > Internal Error > (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 > # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) > > > Perhaps `mask` also need to be adjusted: > > > static size_t object_index(oop obj) { > const uintptr_t addr = ZOop::to_address(obj); > const uintptr_t offset = ZAddress::offset(addr); > const uintptr_t mask = ZGranuleSize - 1; > return (offset & mask) >> ZObjectAlignmentSmallShift; > } > > > Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. > > Please review my patch. > > Thanks, > Leslie Zhai Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: 8291106: ZPlatformGranuleSizeShift is redundant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9582/files - new: https://git.openjdk.org/jdk/pull/9582/files/62b3f871..a8c3d671 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9582&range=03-04 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9582/head:pull/9582 PR: https://git.openjdk.org/jdk/pull/9582 From duke at openjdk.org Wed Jul 27 09:38:30 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Wed, 27 Jul 2022 09:38:30 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v11] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the generation of broadcasting a scalar in several ways: > > - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. > - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. > - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay > > With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: > > Before After > Benchmark Mode Cnt Score Error Score Error Units Gain > SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% > SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% > SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% > SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% > > As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. > > This patch also removes some redundant code paths and renames some incorrectly named instructions. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix errors regarding low sse ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7832/files - new: https://git.openjdk.org/jdk/pull/7832/files/c049d542..6193233f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=09-10 Stats: 31 lines in 1 file changed: 17 ins; 2 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/7832.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7832/head:pull/7832 PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Wed Jul 27 09:40:45 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Wed, 27 Jul 2022 09:40:45 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the generation of broadcasting a scalar in several ways: > > - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. > - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. > - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay > > With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: > > Before After > Benchmark Mode Cnt Score Error Score Error Units Gain > SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% > SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% > SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% > SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% > > As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. > > This patch also removes some redundant code paths and renames some incorrectly named instructions. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: unnecessary TEMP dst ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7832/files - new: https://git.openjdk.org/jdk/pull/7832/files/6193233f..bc01c21b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/7832.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7832/head:pull/7832 PR: https://git.openjdk.org/jdk/pull/7832 From eliu at openjdk.org Wed Jul 27 09:43:05 2022 From: eliu at openjdk.org (Eric Liu) Date: Wed, 27 Jul 2022 09:43:05 GMT Subject: Integrated: 8284990: AArch64: Remove STXR_PREFETCH from CPU features In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 14:59:19 GMT, Eric Liu wrote: > As STXR_PREFETCH is usually done unconditionally in non-JVM code, e.g., > Linux ARM Kernel[1], this patch removes VM_Version::CPU_STXR_PREFETCH > and generates it unconditionally. > > [1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1436779519-2232-16-git-send-email-will.deacon at arm.com/ > > [TEST] > Full Jtreg passed without new failure. This pull request has now been integrated. Changeset: 2bd90c21 Author: Eric Liu Committer: Nick Gasson URL: https://git.openjdk.org/jdk/commit/2bd90c2149bfee4b045c8f376e8bcdf4420ccb5d Stats: 18 lines in 5 files changed: 0 ins; 11 del; 7 mod 8284990: AArch64: Remove STXR_PREFETCH from CPU features Reviewed-by: aph, ngasson, njian ------------- PR: https://git.openjdk.org/jdk/pull/9641 From duke at openjdk.org Wed Jul 27 09:45:25 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Wed, 27 Jul 2022 09:45:25 GMT Subject: RFR: 8290074: Remove implicit arguments for RegisterMap constructor [v2] In-Reply-To: References: Message-ID: <2yemnBYAHLTI95rmFZnwpiK0N_XhtzBZr_jaOpTNjuA=.bc0a617f-ea88-4298-bc94-c031c679010f@github.com> > Currently the `RegisterMap` constructor uses implicit boolean arguments to configure its function. Implicit boolean arguments makes code harder to understand and reason about at the call site. Using explicit scoped enums instead makes it both clear what is being configured and the type safety makes mistakes less likely. > > Update `RegisterMap` constructors to use these scoped enum types instead of booleans. > ```C++ > enum class UpdateMap { skip, yes }; > enum class ProcessFrames { skip, yes }; > enum class WalkContinuation { skip, yes }; > > > Testing: tier1-3 Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Rename yes to include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9455/files - new: https://git.openjdk.org/jdk/pull/9455/files/2ba1c708..f597fcbb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9455&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9455&range=00-01 Stats: 142 lines in 35 files changed: 0 ins; 0 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/9455.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9455/head:pull/9455 PR: https://git.openjdk.org/jdk/pull/9455 From duke at openjdk.org Wed Jul 27 09:49:06 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Wed, 27 Jul 2022 09:49:06 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v10] In-Reply-To: References: Message-ID: <72OWdtfwNfhRBroOvJ1-EgVIsSGGCnnhdBDkcGUDd4w=.4ae9e3bb-2934-41d6-bf14-b44054ef97b1@github.com> On Tue, 26 Jul 2022 16:40:57 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> replI_mem > > The testing of version 07 got failure when run vector tests with `-XX:UseAVX=0 -XX:UseSSE=2`: > > # Internal Error (/workspace/open/src/hotspot/share/opto/constantTable.cpp:217), pid=2750036, tid=2750067 > # assert((constant_addr - _masm.code()->consts()->start()) == con.offset()) failed: must be: 8 == 0 > > Current CompileTask: > C2: 287 29 % b compiler.codegen.TestByteVect::test_ci @ 2 (20 bytes) > > Stack: [0x00007f7abf144000,0x00007f7abf245000], sp=0x00007f7abf23fa30, free space=1006k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xb731c8] ConstantTable::emit(CodeBuffer&) const+0x1c8 > V [libjvm.so+0x17c3673] PhaseOutput::fill_buffer(CodeBuffer*, unsigned int*)+0x293 > V [libjvm.so+0xb191bb] Compile::Code_Gen()+0x42b > V [libjvm.so+0xb1e899] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1699 > > > and > > > # Internal Error (/workspace/open/src/hotspot/cpu/x86/assembler_x86.cpp:5095), pid=1431469, tid=1431493 > # Error: assert(VM_Version::supports_ssse3()) failed > > Current CompileTask: > C2: 468 240 % b 4 java.util.Arrays::fill @ 5 (21 bytes) > > Stack: [0x00007fdecd422000,0x00007fdecd523000], sp=0x00007fdecd51d8c0, free space=1006k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x73079c] Assembler::pshufb(XMMRegisterImpl*, XMMRegisterImpl*)+0x13c > V [libjvm.so+0x4005d1] ReplB_regNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x1a1 > V [libjvm.so+0x17be04e] PhaseOutput::scratch_emit_size(Node const*)+0x45e > V [libjvm.so+0x17b4548] PhaseOutput::shorten_branches(unsigned int*)+0x2d8 > V [libjvm.so+0x17c6faa] PhaseOutput::Output()+0xcfa > V [libjvm.so+0xb191bb] Compile::Code_Gen()+0x42b > V [libjvm.so+0xb1e899] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1699 @vnkozlov I have fixed those errors in the last commits. The second one is due to `pshufb` being supported only on ssse3 machines. And the first one is because the constant table itself is not aligned enough given that currently, it is only aligned at 8 bytes. I chose to avoid the problem and only emit constants requiring at most 8 bytes of alignment as this patch has already touched many areas. A proper solution would be in a separate issue. What do you think? Thanks a lot. https://github.com/openjdk/jdk/blob/2bd90c2149bfee4b045c8f376e8bcdf4420ccb5d/src/hotspot/share/asm/codeBuffer.hpp#L742 ------------- PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Wed Jul 27 10:58:02 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Wed, 27 Jul 2022 10:58:02 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 14:44:01 GMT, Andrew Haley wrote: >> Hi @theRealAph, >> I am sorry I did not get your comment. Could you please explain it? >> >> Thanks, >> Evgeny > > The addition is > 'PhaseOutput* phase_output = Compile::current()->output();' > then > 'phase_output != NULL && phase_output->in_scratch_emit_size()' > > so AFAICS `Compile::current()->output()` is now checked for null, where it was not before. Now I get it. Thank you. I agree this looks suspicious. I could not recall why I added it. Debugging helped me to find out. During the parsing phase of C2 compilation `ciTypeFlow::StateVector::do_invoke` causes `LinkResolver::resolve_static_call` which now has the following code: if (resolved_method->is_continuation_enter_intrinsic() && resolved_method->from_interpreted_entry() == NULL) { // does a load_acquire methodHandle mh(THREAD, resolved_method); // Generate a compiled form of the enterSpecial intrinsic. AdapterHandlerLibrary::create_native_wrapper(mh); } We generate a wrapper which is `nmethod` with trampoline calls. As we are in the parsing phase the output is not created. I can move `Compile::current()->output() != NULL` into the preceding IF and update the comment to the following: Make sure this is code generation of a C2 compilation when Compile::current()->output() is not NULL. C2 can generate native wrappers for the continuation enter intrinsic before code generation. C1 allocates space only for trampoline stubs generated by Call LIR ops. ------------- PR: https://git.openjdk.org/jdk/pull/9592 From thartmann at openjdk.org Wed Jul 27 11:50:10 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 27 Jul 2022 11:50:10 GMT Subject: RFR: 8291002: Rename Method::build_interpreter_method_data to Method::build_profiling_method_data In-Reply-To: References: Message-ID: <7RdpDjKe-aSRWUceJijcvti-_PW3W7LIW3CWEklSpa4=.35e8ecba-e929-4a21-8d7b-5760bfdf3e6f@github.com> On Tue, 26 Jul 2022 08:45:59 GMT, Julian Waters wrote: > As mentioned in the review process for [JDK-8290834](https://bugs.openjdk.org/browse/JDK-8290834) `build_interpreter_method_data` is misleading because it is actually used for creating MethodData*s throughout HotSpot, not just in the interpreter. Renamed the method to `build_profiling_method_data` instead to more accurately describe what it is used for. Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9637 From jwaters at openjdk.org Wed Jul 27 11:56:02 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 27 Jul 2022 11:56:02 GMT Subject: Integrated: 8291002: Rename Method::build_interpreter_method_data to Method::build_profiling_method_data In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 08:45:59 GMT, Julian Waters wrote: > As mentioned in the review process for [JDK-8290834](https://bugs.openjdk.org/browse/JDK-8290834) `build_interpreter_method_data` is misleading because it is actually used for creating MethodData*s throughout HotSpot, not just in the interpreter. Renamed the method to `build_profiling_method_data` instead to more accurately describe what it is used for. This pull request has now been integrated. Changeset: 8ec31976 Author: Julian Waters Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8ec319768399ba83a3ac04c2034666216ebc9cba Stats: 12 lines in 9 files changed: 0 ins; 0 del; 12 mod 8291002: Rename Method::build_interpreter_method_data to Method::build_profiling_method_data Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/9637 From tschatzl at openjdk.org Wed Jul 27 12:22:54 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 27 Jul 2022 12:22:54 GMT Subject: RFR: 8290074: Remove implicit arguments for RegisterMap constructor [v2] In-Reply-To: <2yemnBYAHLTI95rmFZnwpiK0N_XhtzBZr_jaOpTNjuA=.bc0a617f-ea88-4298-bc94-c031c679010f@github.com> References: <2yemnBYAHLTI95rmFZnwpiK0N_XhtzBZr_jaOpTNjuA=.bc0a617f-ea88-4298-bc94-c031c679010f@github.com> Message-ID: On Wed, 27 Jul 2022 09:45:25 GMT, Axel Boldt-Christmas wrote: >> Currently the `RegisterMap` constructor uses implicit boolean arguments to configure its function. Implicit boolean arguments makes code harder to understand and reason about at the call site. Using explicit scoped enums instead makes it both clear what is being configured and the type safety makes mistakes less likely. >> >> Update `RegisterMap` constructors to use these scoped enum types instead of booleans. >> ```C++ >> enum class UpdateMap { skip, yes }; >> enum class ProcessFrames { skip, yes }; >> enum class WalkContinuation { skip, yes }; >> >> >> Testing: tier1-3 > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Rename yes to include Lgtm. Thanks. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/9455 From eosterlund at openjdk.org Wed Jul 27 12:53:10 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 27 Jul 2022 12:53:10 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v5] In-Reply-To: References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: <9B5zHeZYZHKJ_pFB6ace9sYfqwvIkEGv3qWShnDYB2I=.fdf023b8-cb54-4569-9e55-ba73ae518b1a@github.com> On Wed, 27 Jul 2022 09:18:09 GMT, Leslie Zhai wrote: >> Hi, >> >> When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: >> >> >> const size_t ZPlatformGranuleSizeShift = 24; // 16MB >> >> // Granule shift/size >> const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; >> >> // Page size shifts >> const size_t ZPageSizeSmallShift = ZGranuleSizeShift; >> >> // Page sizes >> const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; >> >> >> `zBitField` failed to work: >> >> >> Internal Error >> (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 >> # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) >> >> >> Perhaps `mask` also need to be adjusted: >> >> >> static size_t object_index(oop obj) { >> const uintptr_t addr = ZOop::to_address(obj); >> const uintptr_t offset = ZAddress::offset(addr); >> const uintptr_t mask = ZGranuleSize - 1; >> return (offset & mask) >> ZObjectAlignmentSmallShift; >> } >> >> >> Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. >> >> Please review my patch. >> >> Thanks, >> Leslie Zhai > > Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: > > 8291106: ZPlatformGranuleSizeShift is redundant Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9582 From duke at openjdk.org Wed Jul 27 12:54:36 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Wed, 27 Jul 2022 12:54:36 GMT Subject: Integrated: 8290074: Remove implicit arguments for RegisterMap constructor In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 14:58:07 GMT, Axel Boldt-Christmas wrote: > Currently the `RegisterMap` constructor uses implicit boolean arguments to configure its function. Implicit boolean arguments makes code harder to understand and reason about at the call site. Using explicit scoped enums instead makes it both clear what is being configured and the type safety makes mistakes less likely. > > Update `RegisterMap` constructors to use these scoped enum types instead of booleans. > ```C++ > enum class UpdateMap { skip, yes }; > enum class ProcessFrames { skip, yes }; > enum class WalkContinuation { skip, yes }; > > > Testing: tier1-3 This pull request has now been integrated. Changeset: 2f3e494b Author: Axel Boldt-Christmas Committer: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/2f3e494b80cce8e357ceac9a897c42d7e8f54af5 Stats: 434 lines in 40 files changed: 317 ins; 0 del; 117 mod 8290074: Remove implicit arguments for RegisterMap constructor Reviewed-by: eosterlund, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/9455 From duke at openjdk.org Wed Jul 27 13:37:38 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Wed, 27 Jul 2022 13:37:38 GMT Subject: RFR: 8291237: Encapsulate nmethod Deoptimization logic Message-ID: The proposal is to encapsulate the nmethod mark for deoptimization logic in one place and only allow access to the `mark_for_deoptimization` from a closure object: ```C++ class DeoptimizationMarkerClosure : StackObj { public: virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0; }; This closure takes a `MarkFn` which it uses to mark which nmethods should be deoptimized. This marking can only be done through the `MarkFn` and a `MarkFn` can only be created in the following code which runs the closure. ```C++ { NoSafepointVerifier nsv; assert_locked_or_safepoint(Compile_lock); marker_closure.marker_do(MarkFn()); anything_deoptimized = deoptimize_all_marked(); } if (anything_deoptimized) { run_deoptimize_closure(); } This ensures that this logic is encapsulated and the `NoSafepointVerifier` and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` not having to scan the whole code cache sound. The exception to this pattern, from `InstanceKlass::unload_class`, is discussed in the JBS issue, and gives reasons why not marking for deoptimization there is ok. An effect of this encapsulation is that the deoptimization logic was moved from the `CodeCache` class to the `Deoptimization` class and the class redefinition logic was moved from the `CodeCache` class to the `VM_RedefineClasses` class/operation. Testing: Tier 1-5 ------------- Commit messages: - JDK-8291237: Encapsulate nmethod Deoptimization logic Changes: https://git.openjdk.org/jdk/pull/9655/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9655&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291237 Stats: 719 lines in 21 files changed: 362 ins; 250 del; 107 mod Patch: https://git.openjdk.org/jdk/pull/9655.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9655/head:pull/9655 PR: https://git.openjdk.org/jdk/pull/9655 From rsunderbabu at openjdk.org Wed Jul 27 14:38:50 2022 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Wed, 27 Jul 2022 14:38:50 GMT Subject: RFR: 8289764: gc/lock tests failed with "OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects" Message-ID: Tested with all GC options ------------- Commit messages: - 8289764: gc/lock tests failed with "OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects" - initial commit Changes: https://git.openjdk.org/jdk/pull/9658/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9658&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289764 Stats: 1318 lines in 49 files changed: 1 ins; 1282 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/9658.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9658/head:pull/9658 PR: https://git.openjdk.org/jdk/pull/9658 From rsunderbabu at openjdk.org Wed Jul 27 14:38:51 2022 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Wed, 27 Jul 2022 14:38:51 GMT Subject: RFR: 8289764: gc/lock tests failed with "OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects" In-Reply-To: References: Message-ID: On Wed, 27 Jul 2022 14:03:34 GMT, Ramkumar Sunderbabu wrote: > Tested with all GC options Summary of changes 1. Remove explicit creation of garbage, instead call GC directly. 2. Retain the class of functions being tested - jni, jniref, jvmti, malloc. 3. Only one test is retained for every class of function. ------------- PR: https://git.openjdk.org/jdk/pull/9658 From kvn at openjdk.org Wed Jul 27 15:10:00 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 27 Jul 2022 15:10:00 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v10] In-Reply-To: References: Message-ID: <1GecXeb51hK7x9nrbXnhnAKlY_u5eDnMMapdJ8RoGN4=.471c903d-e659-4bb1-915b-b8615915dd84@github.com> On Tue, 26 Jul 2022 16:40:57 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> replI_mem > > The testing of version 07 got failure when run vector tests with `-XX:UseAVX=0 -XX:UseSSE=2`: > > # Internal Error (/workspace/open/src/hotspot/share/opto/constantTable.cpp:217), pid=2750036, tid=2750067 > # assert((constant_addr - _masm.code()->consts()->start()) == con.offset()) failed: must be: 8 == 0 > > Current CompileTask: > C2: 287 29 % b compiler.codegen.TestByteVect::test_ci @ 2 (20 bytes) > > Stack: [0x00007f7abf144000,0x00007f7abf245000], sp=0x00007f7abf23fa30, free space=1006k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xb731c8] ConstantTable::emit(CodeBuffer&) const+0x1c8 > V [libjvm.so+0x17c3673] PhaseOutput::fill_buffer(CodeBuffer*, unsigned int*)+0x293 > V [libjvm.so+0xb191bb] Compile::Code_Gen()+0x42b > V [libjvm.so+0xb1e899] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1699 > > > and > > > # Internal Error (/workspace/open/src/hotspot/cpu/x86/assembler_x86.cpp:5095), pid=1431469, tid=1431493 > # Error: assert(VM_Version::supports_ssse3()) failed > > Current CompileTask: > C2: 468 240 % b 4 java.util.Arrays::fill @ 5 (21 bytes) > > Stack: [0x00007fdecd422000,0x00007fdecd523000], sp=0x00007fdecd51d8c0, free space=1006k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x73079c] Assembler::pshufb(XMMRegisterImpl*, XMMRegisterImpl*)+0x13c > V [libjvm.so+0x4005d1] ReplB_regNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x1a1 > V [libjvm.so+0x17be04e] PhaseOutput::scratch_emit_size(Node const*)+0x45e > V [libjvm.so+0x17b4548] PhaseOutput::shorten_branches(unsigned int*)+0x2d8 > V [libjvm.so+0x17c6faa] PhaseOutput::Output()+0xcfa > V [libjvm.so+0xb191bb] Compile::Code_Gen()+0x42b > V [libjvm.so+0xb1e899] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1699 > @vnkozlov I have fixed those errors in the last commits. The second one is due to `pshufb` being supported only on ssse3 machines. And the first one is because the constant table itself is not aligned enough given that currently, it is only aligned at 8 bytes. I chose to avoid the problem and only emit constants requiring at most 8 bytes of alignment as this patch has already touched many areas. A proper solution would be in a separate issue. What do you think? > I agree with doing in separate changes. And I will start new testing. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From shade at openjdk.org Wed Jul 27 17:26:05 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 27 Jul 2022 17:26:05 GMT Subject: RFR: 8291000: C2: Purge LoadPLocked and Store*Conditional nodes In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 07:13:01 GMT, Aleksey Shipilev wrote: > The last uses for these nodes was the inline contiguous allocations. With [JDK-8290706](https://bugs.openjdk.org/browse/JDK-8290706), these nodes are not used anymore and can be cleaned up. > > Testing: > - [x] Linux x86_64 fastdebug tier1 > - [x] Linux x86_32 fastdebug tier1 > - [x] Linux AArch64 fastdebug tier1 > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build > - [x] Linux ARM cross-build > - [x] Linux S390X cross-build > - [x] Linux PPC64 cross-build > - [x] Linux RISC-V cross-build Thank you, I'll be integrating soon, if there are no objections. ------------- PR: https://git.openjdk.org/jdk/pull/9636 From asmehra at redhat.com Wed Jul 27 17:48:23 2022 From: asmehra at redhat.com (Ashutosh Mehra) Date: Wed, 27 Jul 2022 13:48:23 -0400 Subject: Tool to peek into CDS archive Message-ID: Hi, I am wondering if there is any way to peek into the CDS archive and look into its contents. I believe a tool that dumps the list of classes in the archive would be really helpful. Regards, Ashutosh Mehra -------------- next part -------------- An HTML attachment was scrubbed... URL: From calvin.cheung at oracle.com Wed Jul 27 19:24:46 2022 From: calvin.cheung at oracle.com (calvin.cheung at oracle.com) Date: Wed, 27 Jul 2022 12:24:46 -0700 Subject: Tool to peek into CDS archive In-Reply-To: References: Message-ID: Hi Ashutosh, You can list the contents of the classes.jsa by doing: java -XX:+PrintSharedArchiveAndExit -XX:+PrintSharedDictionary If you have your own CDS archive, you can specify it using the -XX:SharedArchiveFile option as follows: java -XX:+PrintSharedArchiveAndExit -XX:+PrintSharedDictionary -XX:SharedArchiveFile= Calvin On 7/27/22 10:48 AM, Ashutosh Mehra wrote: > Hi, > I am wondering if there is any way to peek into the CDS archive and > look into its contents. I believe a tool that dumps the list of > classes in the archive would be really helpful. > > Regards, > Ashutosh Mehra From ioi.lam at oracle.com Wed Jul 27 19:49:36 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 27 Jul 2022 12:49:36 -0700 Subject: Tool to peek into CDS archive In-Reply-To: References: Message-ID: <71052879-a7af-4852-f87b-17a11ef29dbe@oracle.com> Currently there's no such a tool. The very closest thing you can find is a map file, which can be generated when dumping the archive. E.g., java -Xshare:dump -Xlog:cds+map=debug:file=cds.map:none:filesize=0 java -Xshare:dump -Xlog:cds+map=trace:file=cds.map:none:filesize=0 Also, if you want to see the list of classes: java -Xshare:dump -Xlog:cds+class=debug Thanks - Ioi On 7/27/2022 10:48 AM, Ashutosh Mehra wrote: > Hi, > I am wondering if there is any way to peek into the CDS archive and > look into its contents. I believe a tool that dumps the list of > classes in the archive would be really helpful. > > Regards, > Ashutosh Mehra From tschatzl at openjdk.org Wed Jul 27 20:17:39 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 27 Jul 2022 20:17:39 GMT Subject: RFR: 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing() [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that fixes some callers of `G1CollectedHeap::heap_region_containing` to a) fail on trying to get the `Heapregion*` on uncommitted regions and b) change some callers to use the correct `heap_region_containing_or_null` method that had been intended there. > > Also remove some now unneccessary asserts as `heap_region_containing` will fail when it previously returned `nullptr`. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9584/files - new: https://git.openjdk.org/jdk/pull/9584/files/b6afb444..b6afb444 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9584&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9584&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9584.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9584/head:pull/9584 PR: https://git.openjdk.org/jdk/pull/9584 From lmesnik at openjdk.org Wed Jul 27 20:36:45 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 27 Jul 2022 20:36:45 GMT Subject: RFR: 8289764: gc/lock tests failed with "OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects" In-Reply-To: References: Message-ID: On Wed, 27 Jul 2022 14:03:34 GMT, Ramkumar Sunderbabu wrote: > Tested with all GC options Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/vmTestbase/gc/lock/LockerTest.java line 51: > 49: public void run() { > 50: locker.lock(); > 51: WhiteBox.getWhiteBox().fullGC(); Could you add some more work for GC here: add and free objects, arrays, strings and free them. Then call WB.GC to ensure that GC is triggered. You could add the corresponding method in GarbageUtils and always use it instead eatMemory. Might be call WB.YoungGC while generating garbage to put something into old gen as well as in young gen ------------- PR: https://git.openjdk.org/jdk/pull/9658 From asmehra at redhat.com Wed Jul 27 20:53:20 2022 From: asmehra at redhat.com (Ashutosh Mehra) Date: Wed, 27 Jul 2022 16:53:20 -0400 Subject: Tool to peek into CDS archive In-Reply-To: <71052879-a7af-4852-f87b-17a11ef29dbe@oracle.com> References: <71052879-a7af-4852-f87b-17a11ef29dbe@oracle.com> Message-ID: Thanks for the commands. These will definitely be useful when generating the archive. If I already have the archive, is there any way to check the contents? Regards, Ashutosh Mehra On Wed, Jul 27, 2022 at 3:50 PM Ioi Lam wrote: > Currently there's no such a tool. The very closest thing you can find is > a map file, which can be generated when dumping the archive. E.g., > > > java -Xshare:dump -Xlog:cds+map=debug:file=cds.map:none:filesize=0 > java -Xshare:dump -Xlog:cds+map=trace:file=cds.map:none:filesize=0 > > Also, if you want to see the list of classes: > > java -Xshare:dump -Xlog:cds+class=debug > > Thanks > - Ioi > > > On 7/27/2022 10:48 AM, Ashutosh Mehra wrote: > > Hi, > > I am wondering if there is any way to peek into the CDS archive and > > look into its contents. I believe a tool that dumps the list of > > classes in the archive would be really helpful. > > > > Regards, > > Ashutosh Mehra > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvn at openjdk.org Wed Jul 27 20:56:44 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 27 Jul 2022 20:56:44 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: Message-ID: On Wed, 27 Jul 2022 09:40:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > unnecessary TEMP dst Got new failure (and testing still running). Test compiler/c2/cr7200264/TestSSE2IntVect.java failed with `-Xcomp`: java.lang.RuntimeException: Unexpected SubVI number: expected 2 >= 4 at jdk.test.lib.Asserts.fail(Asserts.java:594) at jdk.test.lib.Asserts.assertGreaterThanOrEqual(Asserts.java:288) at jdk.test.lib.Asserts.assertGTE(Asserts.java:259) at compiler.c2.cr7200264.TestDriver.verifyVectorizationNumber(TestDriver.java:65) at compiler.c2.cr7200264.TestDriver.run(TestDriver.java:43) at compiler.c2.cr7200264.TestSSE2IntVect.main(TestSSE2IntVect.java:48) ------------- PR: https://git.openjdk.org/jdk/pull/7832 From asmehra at redhat.com Wed Jul 27 21:25:19 2022 From: asmehra at redhat.com (Ashutosh Mehra) Date: Wed, 27 Jul 2022 17:25:19 -0400 Subject: Tool to peek into CDS archive In-Reply-To: References: Message-ID: Hi Calvin, Thanks for the option. "-XX:+PrintSharedArchiveAndExit" is what I was looking for. btw it looks like the PrintSharedDictionary flag is redundant, I guess PrintSharedArchiveAndExit already covers it. I don't see PrintSharedDictionary being checked anywhere in the code base. Regards, Ashutosh Mehra On Wed, Jul 27, 2022 at 5:02 PM wrote: > Hi Ashutosh, > > You can list the contents of the classes.jsa by doing: > > java -XX:+PrintSharedArchiveAndExit -XX:+PrintSharedDictionary > > If you have your own CDS archive, you can specify it using the > -XX:SharedArchiveFile option as follows: > > java -XX:+PrintSharedArchiveAndExit -XX:+PrintSharedDictionary > -XX:SharedArchiveFile= > > Calvin > > > On 7/27/22 10:48 AM, Ashutosh Mehra wrote: > > Hi, > > I am wondering if there is any way to peek into the CDS archive and > > look into its contents. I believe a tool that dumps the list of > > classes in the archive would be really helpful. > > > > Regards, > > Ashutosh Mehra > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From calvin.cheung at oracle.com Wed Jul 27 23:15:01 2022 From: calvin.cheung at oracle.com (calvin.cheung at oracle.com) Date: Wed, 27 Jul 2022 16:15:01 -0700 Subject: [External] : Re: Tool to peek into CDS archive In-Reply-To: References: Message-ID: On 7/27/22 2:25 PM, Ashutosh Mehra wrote: > Hi Calvin, > > Thanks for the option. "-XX:+PrintSharedArchiveAndExit" is what I was > looking for. > btw it looks like the PrintSharedDictionary flag is redundant, I guess > PrintSharedArchiveAndExit already covers it. > I don't see PrintSharedDictionary being checked anywhere in the code base. You're correct. I only see the flag declared in globals.hpp. ./share/runtime/globals.hpp:? product(bool, PrintSharedDictionary, false,?????????????????????????????? \ In 8u, one needs to specify -XX:+PrintSharedDictionary in order to display the class names in a CDS archive. I haven't looked into since which release the processing of the flag was removed. Anyway, I've file https://bugs.openjdk.org/browse/JDK-8291443 to clean it up. thanks, Calvin > > Regards, > Ashutosh Mehra > > > On Wed, Jul 27, 2022 at 5:02 PM wrote: > > Hi Ashutosh, > > You can list the contents of the classes.jsa by doing: > > java -XX:+PrintSharedArchiveAndExit -XX:+PrintSharedDictionary > > If you have your own CDS archive, you can specify it using the > -XX:SharedArchiveFile option as follows: > > java -XX:+PrintSharedArchiveAndExit -XX:+PrintSharedDictionary > -XX:SharedArchiveFile= > > Calvin > > > On 7/27/22 10:48 AM, Ashutosh Mehra wrote: > > Hi, > > I am wondering if there is any way to peek into the CDS archive and > > look into its contents. I believe a tool that dumps the list of > > classes in the archive would be really helpful. > > > > Regards, > > Ashutosh Mehra > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Thu Jul 28 02:55:44 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Thu, 28 Jul 2022 02:55:44 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: Message-ID: On Wed, 27 Jul 2022 09:40:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > unnecessary TEMP dst It does not seem related as this patch has effects only after matching so it should not change the IR graph of the compilations ------------- PR: https://git.openjdk.org/jdk/pull/7832 From shade at openjdk.org Thu Jul 28 08:20:07 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 28 Jul 2022 08:20:07 GMT Subject: Integrated: 8291000: C2: Purge LoadPLocked and Store*Conditional nodes In-Reply-To: References: Message-ID: On Tue, 26 Jul 2022 07:13:01 GMT, Aleksey Shipilev wrote: > The last uses for these nodes was the inline contiguous allocations. With [JDK-8290706](https://bugs.openjdk.org/browse/JDK-8290706), these nodes are not used anymore and can be cleaned up. > > Testing: > - [x] Linux x86_64 fastdebug tier1 > - [x] Linux x86_32 fastdebug tier1 > - [x] Linux AArch64 fastdebug tier1 > - [x] Linux x86_64 Zero build > - [x] Linux AArch64 cross-build > - [x] Linux ARM cross-build > - [x] Linux S390X cross-build > - [x] Linux PPC64 cross-build > - [x] Linux RISC-V cross-build This pull request has now been integrated. Changeset: dd69a68d Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/dd69a68d095a67b6ea1479d05285dd8be50bfbf2 Stats: 556 lines in 17 files changed: 0 ins; 552 del; 4 mod 8291000: C2: Purge LoadPLocked and Store*Conditional nodes Reviewed-by: eosterlund, kvn ------------- PR: https://git.openjdk.org/jdk/pull/9636 From aph at openjdk.org Thu Jul 28 08:28:06 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 28 Jul 2022 08:28:06 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: On Wed, 27 Jul 2022 10:54:26 GMT, Evgeny Astigeevich wrote: >> The addition is >> `PhaseOutput* phase_output = Compile::current()->output();` >> then >> `phase_output != NULL && phase_output->in_scratch_emit_size()` >> >> so AFAICS `Compile::current()->output()` is now checked for null, where it was not before. > > Now I get it. Thank you. > > I agree this looks suspicious. I could not recall why I added it. > Debugging helped me to find out. > During the parsing phase of C2 compilation `ciTypeFlow::StateVector::do_invoke` causes `LinkResolver::resolve_static_call` which now has the following code: > > if (resolved_method->is_continuation_enter_intrinsic() > && resolved_method->from_interpreted_entry() == NULL) { // does a load_acquire > methodHandle mh(THREAD, resolved_method); > // Generate a compiled form of the enterSpecial intrinsic. > AdapterHandlerLibrary::create_native_wrapper(mh); > } > > We generate a wrapper which is `nmethod` with trampoline calls. > As we are in the parsing phase the output is not created. > I can move `Compile::current()->output() != NULL` into the preceding IF and update the comment to the following: > > Make sure this is code generation of a C2 compilation when Compile::current()->output() is not NULL. > C2 can generate native wrappers for the continuation enter intrinsic before code generation. > C1 allocates space only for trampoline stubs generated by Call LIR ops. This is all rather complicated and obscure. It seems to me that passing a bool `check_emit_size` is exactly what we should do: it's more explicit and helps the reader. ------------- PR: https://git.openjdk.org/jdk/pull/9592 From jiefu at openjdk.org Thu Jul 28 08:58:39 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 28 Jul 2022 08:58:39 GMT Subject: RFR: 8291106: ZPlatformGranuleSizeShift is redundant [v5] In-Reply-To: References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: On Wed, 27 Jul 2022 09:18:09 GMT, Leslie Zhai wrote: >> Hi, >> >> When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: >> >> >> const size_t ZPlatformGranuleSizeShift = 24; // 16MB >> >> // Granule shift/size >> const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; >> >> // Page size shifts >> const size_t ZPageSizeSmallShift = ZGranuleSizeShift; >> >> // Page sizes >> const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; >> >> >> `zBitField` failed to work: >> >> >> Internal Error >> (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 >> # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) >> >> >> Perhaps `mask` also need to be adjusted: >> >> >> static size_t object_index(oop obj) { >> const uintptr_t addr = ZOop::to_address(obj); >> const uintptr_t offset = ZAddress::offset(addr); >> const uintptr_t mask = ZGranuleSize - 1; >> return (offset & mask) >> ZObjectAlignmentSmallShift; >> } >> >> >> Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. >> >> Please review my patch. >> >> Thanks, >> Leslie Zhai > > Leslie Zhai has updated the pull request incrementally with one additional commit since the last revision: > > 8291106: ZPlatformGranuleSizeShift is redundant Marked as reviewed by jiefu (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9582 From lzhai at openjdk.org Thu Jul 28 09:02:01 2022 From: lzhai at openjdk.org (Leslie Zhai) Date: Thu, 28 Jul 2022 09:02:01 GMT Subject: Integrated: 8291106: ZPlatformGranuleSizeShift is redundant In-Reply-To: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> References: <_FnzZYPNYxiAi9jAUQWdAxJA6opZCl1aItEC2gP2OtI=.6d3d4a78-0195-43f0-bf50-e1bec728c9da@github.com> Message-ID: On Thu, 21 Jul 2022 02:17:33 GMT, Leslie Zhai wrote: > Hi, > > When I am tunning `ZPageSizeSmall` to 16 MB from 2MB: > > > const size_t ZPlatformGranuleSizeShift = 24; // 16MB > > // Granule shift/size > const size_t ZGranuleSizeShift = ZPlatformGranuleSizeShift; > > // Page size shifts > const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > > // Page sizes > const size_t ZPageSizeSmall = (size_t)1 << ZPageSizeSmallShift; > > > `zBitField` failed to work: > > > Internal Error > (/home/zhaixiang/jdk/src/hotspot/share/gc/z/zBitField.hpp:76), pid=923047, tid=923069 > # assert(((ContainerType)value & (FieldMask << ValueShift)) == (ContainerType)value) failed: Invalid value > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-aarch64) > > > Perhaps `mask` also need to be adjusted: > > > static size_t object_index(oop obj) { > const uintptr_t addr = ZOop::to_address(obj); > const uintptr_t offset = ZAddress::offset(addr); > const uintptr_t mask = ZGranuleSize - 1; > return (offset & mask) >> ZObjectAlignmentSmallShift; > } > > > Back to the point: ZPlatformGranuleSizeShift is redundant due to it can not be modified, for example, `24`, so I removed the `ZPlatformGranuleSizeShift` from cpu backends, just keep it for debugging AArch64 to see the issue. > > Please review my patch. > > Thanks, > Leslie Zhai This pull request has now been integrated. Changeset: 97fc8deb Author: Leslie Zhai Committer: Jie Fu URL: https://git.openjdk.org/jdk/commit/97fc8deb1db6deb5f841d64f5e8e3b825783a680 Stats: 10 lines in 5 files changed: 0 ins; 4 del; 6 mod 8291106: ZPlatformGranuleSizeShift is redundant Reviewed-by: eosterlund, jiefu ------------- PR: https://git.openjdk.org/jdk/pull/9582 From bulasevich at openjdk.org Thu Jul 28 09:33:59 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 28 Jul 2022 09:33:59 GMT Subject: RFR: 8288477: nmethod header size reduction [v3] In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 18:24:33 GMT, Tom Rodriguez wrote: >> Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Undo applying CompLevel where applicable. It must be a separate change > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 258: > >> 256: \ >> 257: nonstatic_field(nmethod, _verified_entry_point, address) \ >> 258: nonstatic_field(nmethod, _comp_level, int) \ > > You should declare CompLevel in this file as well. I think it might be missing the sanity checking that detect missing type declarations. Hmm. Actually most of the types used in vmStructs_jvmci.cpp are not declared in VM_TYPES: - int, intptr_t, jbyte, jint, jlong, juint, u1, u2, u4, uint, uint64_t, uintptr_t, unsigned int, void* - AccessFlags, Annotations*, ClassLoaderData*, CollectedHeap*, CompiledMethod*, ConstMethod*, - JavaFrameAnchor, JavaThread*, MethodCounters*, MethodData*, ObjectWaiter*, OopHandle, OSThread*, Thread* is it an issue? ------------- PR: https://git.openjdk.org/jdk/pull/9165 From bulasevich at openjdk.org Thu Jul 28 09:39:52 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 28 Jul 2022 09:39:52 GMT Subject: RFR: 8288477: nmethod header size reduction [v4] In-Reply-To: References: Message-ID: > Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. > > Cleanup work: apply CompLevel type where applicable. > > The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime > > Renaissance benchmarks shows no performance regressions on x86 and aarch. > > BEFORE: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > /* 8 | 4 */ const CompilerType _type; <<<< > /* 12 | 4 */ int _size; > /* 16 | 4 */ int _header_size; > /* 20 | 4 */ int _frame_complete_offset; > /* 24 | 4 */ int _data_offset; > /* 28 | 4 */ int _frame_size; > /* 32 | 8 */ address _code_begin; > /* 40 | 8 */ address _code_end; > /* 48 | 8 */ address _content_begin; > /* 56 | 8 */ address _data_end; > /* 64 | 8 */ address _relocation_begin; > /* 72 | 8 */ address _relocation_end; > /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 88 | 1 */ bool _caller_must_gc_arguments; > /* 89 | 1 */ bool _is_compiled; > /* XXX 6-byte hole */ > /* 96 | 8 */ const char *_name; > /* 104 | 8 */ class AsmRemarks { > /* 104 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 112 | 8 */ class DbgStrings { > /* 112 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 120 */ > } > > AFTER: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > protected: > /* 8 | 8 */ address _code_begin; > /* 16 | 8 */ address _code_end; > /* 24 | 8 */ address _content_begin; > /* 32 | 8 */ address _data_end; > /* 40 | 8 */ address _relocation_begin; > /* 48 | 8 */ address _relocation_end; > /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 64 | 8 */ const char *_name; > /* 72 | 4 */ int _size; > /* 76 | 4 */ int _header_size; > /* 80 | 4 */ int _frame_complete_offset; > /* 84 | 4 */ int _data_offset; > /* 88 | 4 */ int _frame_size; > /* 92 | 1 */ bool _caller_must_gc_arguments; > /* 93 | 1 */ bool _is_compiled; > /* 94 | 1 */ const CompilerType _type; <<<< > /* XXX 1-byte hole */ > /* 96 | 8 */ class AsmRemarks { > /* 96 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 104 | 8 */ class DbgStrings { > /* 104 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 112 */ > } > > BEFORE: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > private: > /* 208 | 4 */ int _entry_bci; > /* XXX 4-byte hole */ > /* 216 | 8 */ uint64_t _gc_epoch; > /* 224 | 8 */ nmethod *_osr_link; > /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 240 | 8 */ address _entry_point; > /* 248 | 8 */ address _verified_entry_point; > /* 256 | 8 */ address _osr_entry_point; > /* 264 | 4 */ int _exception_offset; > /* 268 | 4 */ int _unwind_handler_offset; > /* 272 | 4 */ int _consts_offset; > /* 276 | 4 */ int _stub_offset; > /* 280 | 4 */ int _oops_offset; > /* 284 | 4 */ int _metadata_offset; > /* 288 | 4 */ int _scopes_data_offset; > /* 292 | 4 */ int _scopes_pcs_offset; > /* 296 | 4 */ int _dependencies_offset; > /* 300 | 4 */ int _handler_table_offset; > /* 304 | 4 */ int _nul_chk_table_offset; > /* 308 | 4 */ int _speculations_offset; > /* 312 | 4 */ int _jvmci_data_offset; > /* 316 | 4 */ int _nmethod_end_offset; > /* 320 | 4 */ int _orig_pc_offset; > /* 324 | 4 */ int _compile_id; > /* 328 | 4 */ int _comp_level; <<<< > /* 332 | 1 */ bool _has_flushed_dependencies; > /* 333 | 1 */ bool _unload_reported; > /* 334 | 1 */ bool _load_reported; > /* 335 | 1 */ volatile signed char _state; > /* 336 | 1 */ bool _oops_are_stale; > /* XXX 3-byte hole */ > /* 340 | 4 */ RTMState _rtm_state; > /* 344 | 4 */ volatile jint _lock_count; > /* XXX 4-byte hole */ > /* 352 | 8 */ volatile int64_t _stack_traversal_mark; > /* 360 | 4 */ int _hotness_counter; > /* 364 | 1 */ volatile uint8_t _is_unloading_state; > /* XXX 3-byte hole */ > /* 368 | 4 */ ByteSize _native_receiver_sp_offset; > /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; > > /* total size (bytes): 376 */ > } > > AFTER: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; > /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 232 | 8 */ address _entry_point; > /* 240 | 8 */ address _verified_entry_point; > /* 248 | 8 */ address _osr_entry_point; > /* 256 | 4 */ int _entry_bci; > /* 260 | 4 */ int _exception_offset; > /* 264 | 4 */ int _unwind_handler_offset; > /* 268 | 4 */ int _consts_offset; > /* 272 | 4 */ int _stub_offset; > /* 276 | 4 */ int _oops_offset; > /* 280 | 4 */ int _metadata_offset; > /* 284 | 4 */ int _scopes_data_offset; > /* 288 | 4 */ int _scopes_pcs_offset; > /* 292 | 4 */ int _dependencies_offset; > /* 296 | 4 */ int _handler_table_offset; > /* 300 | 4 */ int _nul_chk_table_offset; > /* 304 | 4 */ int _speculations_offset; > /* 308 | 4 */ int _jvmci_data_offset; > /* 312 | 4 */ int _nmethod_end_offset; > /* 316 | 4 */ int _orig_pc_offset; > /* 320 | 4 */ int _compile_id; > /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; > /* 332 | 4 */ int _hotness_counter; > /* 336 | 4 */ ByteSize _native_receiver_sp_offset; > /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; > /* 344 | 1 */ CompLevel _comp_level; <<<< > /* 345 | 1 */ volatile uint8_t _is_unloading_state; > /* 346 | 1 */ bool _has_flushed_dependencies; > /* 347 | 1 */ bool _unload_reported; > /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; > /* 350 | 1 */ bool _oops_are_stale; > > /* total size (bytes): 352 */ > } Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: declare CompLevel properly in vmStructs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9165/files - new: https://git.openjdk.org/jdk/pull/9165/files/2d2a07af..3acf5dd7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9165&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9165&range=02-03 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9165.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9165/head:pull/9165 PR: https://git.openjdk.org/jdk/pull/9165 From duke at openjdk.org Thu Jul 28 09:53:38 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 28 Jul 2022 09:53:38 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v3] In-Reply-To: References: Message-ID: > `trampoline_call` can do dummy code generation to calculate the size of C2 generated code. This is done in the output phase. In [src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042](https://github.com/openjdk/jdk/blob/e0d361cea91d3dd1450aece73f660b4abb7ce5fa/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L1042) Loom code needed to generate a trampoline call outside of C2 and without the output phase. This caused test crashes. The project Loom added `trampoline_call1` to workaround the crashes. > > This PR improves detection of C2 output phase which makes `trampoline_call1` redundant. > > Tested the fastdebug/release builds: > - `'gtest`: Passed > - `tier1`...`tier2`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Restore check_emit_size parameter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9592/files - new: https://git.openjdk.org/jdk/pull/9592/files/e732890b..4b5953b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9592&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9592&range=01-02 Stats: 16 lines in 3 files changed: 1 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/9592.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9592/head:pull/9592 PR: https://git.openjdk.org/jdk/pull/9592 From duke at openjdk.org Thu Jul 28 09:57:51 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Thu, 28 Jul 2022 09:57:51 GMT Subject: RFR: 8287393: AArch64: Remove trampoline_call1 [v2] In-Reply-To: References: Message-ID: On Thu, 28 Jul 2022 08:25:46 GMT, Andrew Haley wrote: >> Now I get it. Thank you. >> >> I agree this looks suspicious. I could not recall why I added it. >> Debugging helped me to find out. >> During the parsing phase of C2 compilation `ciTypeFlow::StateVector::do_invoke` causes `LinkResolver::resolve_static_call` which now has the following code: >> >> if (resolved_method->is_continuation_enter_intrinsic() >> && resolved_method->from_interpreted_entry() == NULL) { // does a load_acquire >> methodHandle mh(THREAD, resolved_method); >> // Generate a compiled form of the enterSpecial intrinsic. >> AdapterHandlerLibrary::create_native_wrapper(mh); >> } >> >> We generate a wrapper which is `nmethod` with trampoline calls. >> As we are in the parsing phase the output is not created. >> I can move `Compile::current()->output() != NULL` into the preceding IF and update the comment to the following: >> >> Make sure this is code generation of a C2 compilation when Compile::current()->output() is not NULL. >> C2 can generate native wrappers for the continuation enter intrinsic before code generation. >> C1 allocates space only for trampoline stubs generated by Call LIR ops. > > This is all rather complicated and obscure. It seems to me that passing a bool `check_emit_size` is exactly what we should do: it's more explicit and helps the reader. I've restored `check_emit_size` and created an assert to guard it is properly used. I'll remove `cbuf` by fixing: https://bugs.openjdk.org/browse/JDK-8287394 ------------- PR: https://git.openjdk.org/jdk/pull/9592 From tschatzl at openjdk.org Thu Jul 28 10:15:41 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 28 Jul 2022 10:15:41 GMT Subject: RFR: 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing() [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that fixes some callers of `G1CollectedHeap::heap_region_containing` to a) fail on trying to get the `Heapregion*` on uncommitted regions and b) change some callers to use the correct `heap_region_containing_or_null` method that had been intended there. > > Also remove some now unneccessary asserts as `heap_region_containing` will fail when it previously returned `nullptr`. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into 8290715-heapregion-containing-no-null - Some more removal of unnecessary checks - Initial changes - Fix the places where the wrong heap_region_containing() method has been used - kbarrett review - Remove some more casts - Revert changes to heap_region_containing() - Remove cast - Remove unnecessary changes - some changes - ... and 1 more: https://git.openjdk.org/jdk/compare/97fc8deb...5dc2fc08 ------------- Changes: https://git.openjdk.org/jdk/pull/9584/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9584&range=02 Stats: 23 lines in 5 files changed: 1 ins; 8 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/9584.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9584/head:pull/9584 PR: https://git.openjdk.org/jdk/pull/9584 From duke at openjdk.org Thu Jul 28 15:16:42 2022 From: duke at openjdk.org (Axel Boldt-Christmas) Date: Thu, 28 Jul 2022 15:16:42 GMT Subject: RFR: 8291237: Encapsulate nmethod Deoptimization logic In-Reply-To: References: Message-ID: On Wed, 27 Jul 2022 12:55:04 GMT, Axel Boldt-Christmas wrote: > The proposal is to encapsulate the nmethod mark for deoptimization logic in one place and only allow access to the `mark_for_deoptimization` from a closure object: > ```C++ > class DeoptimizationMarkerClosure : StackObj { > public: > virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0; > }; > > This closure takes a `MarkFn` which it uses to mark which nmethods should be deoptimized. This marking can only be done through the `MarkFn` and a `MarkFn` can only be created in the following code which runs the closure. > ```C++ > { > NoSafepointVerifier nsv; > assert_locked_or_safepoint(Compile_lock); > marker_closure.marker_do(MarkFn()); > anything_deoptimized = deoptimize_all_marked(); > } > if (anything_deoptimized) { > run_deoptimize_closure(); > } > > This ensures that this logic is encapsulated and the `NoSafepointVerifier` and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` not having to scan the whole code cache sound. > > The exception to this pattern, from `InstanceKlass::unload_class`, is discussed in the JBS issue, and gives reasons why not marking for deoptimization there is ok. > > An effect of this encapsulation is that the deoptimization logic was moved from the `CodeCache` class to the `Deoptimization` class and the class redefinition logic was moved from the `CodeCache` class to the `VM_RedefineClasses` class/operation. > > Testing: Tier 1-5 @fisk suggested using a RAII context object instead of a closure to guarantee the encapsulated invariants. Testing the patch now and will push when done. Not having to create a closure everywhere makes the code less verbose and more readable. ------------- PR: https://git.openjdk.org/jdk/pull/9655 From kvn at openjdk.org Thu Jul 28 16:49:16 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 28 Jul 2022 16:49:16 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: Message-ID: On Wed, 27 Jul 2022 09:40:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > unnecessary TEMP dst I verified that the latest failure I posted is not related to these changes. There were no other failures. Approved. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/7832 From kvn at openjdk.org Thu Jul 28 17:50:49 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 28 Jul 2022 17:50:49 GMT Subject: RFR: 8288477: nmethod header size reduction [v4] In-Reply-To: References: Message-ID: <69WqTWeQwA6YLN15ZsxGHWX-ygtu3BpUrg2aWuoXBuE=.2d1fbfd5-afb0-4403-9401-e28d9c297bcd@github.com> On Thu, 28 Jul 2022 09:39:52 GMT, Boris Ulasevich wrote: >> Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. >> >> Cleanup work: apply CompLevel type where applicable. >> >> The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime >> >> Renaissance benchmarks shows no performance regressions on x86 and aarch. >> >> BEFORE: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> /* 8 | 4 */ const CompilerType _type; <<<< >> /* 12 | 4 */ int _size; >> /* 16 | 4 */ int _header_size; >> /* 20 | 4 */ int _frame_complete_offset; >> /* 24 | 4 */ int _data_offset; >> /* 28 | 4 */ int _frame_size; >> /* 32 | 8 */ address _code_begin; >> /* 40 | 8 */ address _code_end; >> /* 48 | 8 */ address _content_begin; >> /* 56 | 8 */ address _data_end; >> /* 64 | 8 */ address _relocation_begin; >> /* 72 | 8 */ address _relocation_end; >> /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 88 | 1 */ bool _caller_must_gc_arguments; >> /* 89 | 1 */ bool _is_compiled; >> /* XXX 6-byte hole */ >> /* 96 | 8 */ const char *_name; >> /* 104 | 8 */ class AsmRemarks { >> /* 104 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 112 | 8 */ class DbgStrings { >> /* 112 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 120 */ >> } >> >> AFTER: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> protected: >> /* 8 | 8 */ address _code_begin; >> /* 16 | 8 */ address _code_end; >> /* 24 | 8 */ address _content_begin; >> /* 32 | 8 */ address _data_end; >> /* 40 | 8 */ address _relocation_begin; >> /* 48 | 8 */ address _relocation_end; >> /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 64 | 8 */ const char *_name; >> /* 72 | 4 */ int _size; >> /* 76 | 4 */ int _header_size; >> /* 80 | 4 */ int _frame_complete_offset; >> /* 84 | 4 */ int _data_offset; >> /* 88 | 4 */ int _frame_size; >> /* 92 | 1 */ bool _caller_must_gc_arguments; >> /* 93 | 1 */ bool _is_compiled; >> /* 94 | 1 */ const CompilerType _type; <<<< >> /* XXX 1-byte hole */ >> /* 96 | 8 */ class AsmRemarks { >> /* 96 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 104 | 8 */ class DbgStrings { >> /* 104 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 112 */ >> } >> >> BEFORE: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> private: >> /* 208 | 4 */ int _entry_bci; >> /* XXX 4-byte hole */ >> /* 216 | 8 */ uint64_t _gc_epoch; >> /* 224 | 8 */ nmethod *_osr_link; >> /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 240 | 8 */ address _entry_point; >> /* 248 | 8 */ address _verified_entry_point; >> /* 256 | 8 */ address _osr_entry_point; >> /* 264 | 4 */ int _exception_offset; >> /* 268 | 4 */ int _unwind_handler_offset; >> /* 272 | 4 */ int _consts_offset; >> /* 276 | 4 */ int _stub_offset; >> /* 280 | 4 */ int _oops_offset; >> /* 284 | 4 */ int _metadata_offset; >> /* 288 | 4 */ int _scopes_data_offset; >> /* 292 | 4 */ int _scopes_pcs_offset; >> /* 296 | 4 */ int _dependencies_offset; >> /* 300 | 4 */ int _handler_table_offset; >> /* 304 | 4 */ int _nul_chk_table_offset; >> /* 308 | 4 */ int _speculations_offset; >> /* 312 | 4 */ int _jvmci_data_offset; >> /* 316 | 4 */ int _nmethod_end_offset; >> /* 320 | 4 */ int _orig_pc_offset; >> /* 324 | 4 */ int _compile_id; >> /* 328 | 4 */ int _comp_level; <<<< >> /* 332 | 1 */ bool _has_flushed_dependencies; >> /* 333 | 1 */ bool _unload_reported; >> /* 334 | 1 */ bool _load_reported; >> /* 335 | 1 */ volatile signed char _state; >> /* 336 | 1 */ bool _oops_are_stale; >> /* XXX 3-byte hole */ >> /* 340 | 4 */ RTMState _rtm_state; >> /* 344 | 4 */ volatile jint _lock_count; >> /* XXX 4-byte hole */ >> /* 352 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 360 | 4 */ int _hotness_counter; >> /* 364 | 1 */ volatile uint8_t _is_unloading_state; >> /* XXX 3-byte hole */ >> /* 368 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; >> >> /* total size (bytes): 376 */ >> } >> >> AFTER: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> /* 200 | 8 */ uint64_t _gc_epoch; >> /* 208 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 216 | 8 */ nmethod *_osr_link; >> /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 232 | 8 */ address _entry_point; >> /* 240 | 8 */ address _verified_entry_point; >> /* 248 | 8 */ address _osr_entry_point; >> /* 256 | 4 */ int _entry_bci; >> /* 260 | 4 */ int _exception_offset; >> /* 264 | 4 */ int _unwind_handler_offset; >> /* 268 | 4 */ int _consts_offset; >> /* 272 | 4 */ int _stub_offset; >> /* 276 | 4 */ int _oops_offset; >> /* 280 | 4 */ int _metadata_offset; >> /* 284 | 4 */ int _scopes_data_offset; >> /* 288 | 4 */ int _scopes_pcs_offset; >> /* 292 | 4 */ int _dependencies_offset; >> /* 296 | 4 */ int _handler_table_offset; >> /* 300 | 4 */ int _nul_chk_table_offset; >> /* 304 | 4 */ int _speculations_offset; >> /* 308 | 4 */ int _jvmci_data_offset; >> /* 312 | 4 */ int _nmethod_end_offset; >> /* 316 | 4 */ int _orig_pc_offset; >> /* 320 | 4 */ int _compile_id; >> /* 324 | 4 */ RTMState _rtm_state; >> /* 328 | 4 */ volatile jint _lock_count; >> /* 332 | 4 */ int _hotness_counter; >> /* 336 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; >> /* 344 | 1 */ CompLevel _comp_level; <<<< >> /* 345 | 1 */ volatile uint8_t _is_unloading_state; >> /* 346 | 1 */ bool _has_flushed_dependencies; >> /* 347 | 1 */ bool _unload_reported; >> /* 348 | 1 */ bool _load_reported; >> /* 349 | 1 */ volatile signed char _state; >> /* 350 | 1 */ bool _oops_are_stale; >> >> /* total size (bytes): 352 */ >> } > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > declare CompLevel properly in vmStructs Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9165 From jbhateja at openjdk.org Thu Jul 28 18:20:00 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 28 Jul 2022 18:20:00 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: <0TH2Cv2t4pTvoEZ9c4MLAtMqTWF2_tHYwFq-Z_pmbbQ=.5ab14625-26c7-4cb8-914e-51b3059a69fb@github.com> References: <0TH2Cv2t4pTvoEZ9c4MLAtMqTWF2_tHYwFq-Z_pmbbQ=.5ab14625-26c7-4cb8-914e-51b3059a69fb@github.com> Message-ID: On Tue, 26 Jul 2022 12:48:16 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4388: >> >>> 4386: >>> 4387: void MacroAssembler::vallones(XMMRegister dst, int vector_len) { >>> 4388: // vpcmpeqd has special dependency treatment so it should be preferred to vpternlogd >> >> Comment is not clear, adding relevant reference will add more value. > > I have remeasured the statement, it seems that only the non-vex encoding version receives the special dependency treatment, so I reverted this change and added a comment for clarification. > > The optimisation can be found noticed in [The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers](https://www.agner.org/optimize/) on several architectures such as in section 9.8 (Register allocation and renaming in Sandy Bridge and Ivy Bridge pipeline). > > I have performed measurements on uica.uops.info . While this sequence gives 1.37 cycles/iteration on Skylake and Icelake > > pcmpeqd xmm0, xmm0 > paddd xmm0, xmm1 > paddd xmm0, xmm1 > paddd xmm0, xmm1 > > This version has the throughput of 4 cycles/iteration > > vpcmpeqd xmm0, xmm0, xmm0 > vpaddd xmm0, xmm1, xmm0 > vpaddd xmm0, xmm1, xmm0 > vpaddd xmm0, xmm1, xmm0 > > Which indicates the `vpcmpeqd` failing to break dependencies on `xmm0` as opposed to the `pcmpeqd` instruction. > > Thanks. Both the above JIT sequences have true dependency chain, there is no scope of any additional architecture imposed false dependency doing any further perf degradation for which we use dep-breaking idioms. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From never at openjdk.org Thu Jul 28 18:30:39 2022 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 28 Jul 2022 18:30:39 GMT Subject: RFR: 8288477: nmethod header size reduction [v4] In-Reply-To: References: Message-ID: On Thu, 28 Jul 2022 09:31:39 GMT, Boris Ulasevich wrote: >> src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 258: >> >>> 256: \ >>> 257: nonstatic_field(nmethod, _verified_entry_point, address) \ >>> 258: nonstatic_field(nmethod, _comp_level, int) \ >> >> You should declare CompLevel in this file as well. I think it might be missing the sanity checking that detect missing type declarations. > > Hmm. Actually most of the types used in vmStructs_jvmci.cpp are not declared in VM_TYPES: > - int, intptr_t, jbyte, jint, jlong, juint, u1, u2, u4, uint, uint64_t, uintptr_t, unsigned int, void* > - AccessFlags, Annotations*, ClassLoaderData*, CollectedHeap*, CompiledMethod*, ConstMethod*, > - JavaFrameAnchor, JavaThread*, MethodCounters*, MethodData*, ObjectWaiter*, OopHandle, OSThread*, Thread* > > is it an issue? You're right, it's fine to leave it out. Vladimir had added some sanity checks to the JVMCI vmStructs in https://bugs.openjdk.org/browse/JDK-8237497 and I'd assumed it was all of the checks that the regular vmStructs does but it's not. The regular VMStructs requires that all reachable types are fully described but JVMCI actually doesn't use the VMTypeEntry/declare_toplevel_type stuff at all. It's not exposed API so it should be dropped I think. A debug build will catch the field type mismatches at compile time which is the primary thing we care about, though I think that should enabled in all builds instead of being debug only. The required checks should fold away or produce a compile time error. You can see that jvmci_vmStructs_init compiles into an empty method in the fastdebug build. Anyway, I filed https://bugs.openjdk.org/browse/JDK-8291513 to remove those declarations completely. ------------- PR: https://git.openjdk.org/jdk/pull/9165 From bulasevich at openjdk.org Thu Jul 28 19:50:53 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 28 Jul 2022 19:50:53 GMT Subject: RFR: 8288477: nmethod header size reduction [v4] In-Reply-To: References: Message-ID: <7IIZWiJTwX4rSJMy6te8SIZvZN2Vw0hvOjm5fjS9j7o=.f7540cd0-35bc-49af-a66a-45cff8254231@github.com> On Thu, 28 Jul 2022 09:39:52 GMT, Boris Ulasevich wrote: >> Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. >> >> Cleanup work: apply CompLevel type where applicable. >> >> The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime >> >> Renaissance benchmarks shows no performance regressions on x86 and aarch. >> >> BEFORE: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> /* 8 | 4 */ const CompilerType _type; <<<< >> /* 12 | 4 */ int _size; >> /* 16 | 4 */ int _header_size; >> /* 20 | 4 */ int _frame_complete_offset; >> /* 24 | 4 */ int _data_offset; >> /* 28 | 4 */ int _frame_size; >> /* 32 | 8 */ address _code_begin; >> /* 40 | 8 */ address _code_end; >> /* 48 | 8 */ address _content_begin; >> /* 56 | 8 */ address _data_end; >> /* 64 | 8 */ address _relocation_begin; >> /* 72 | 8 */ address _relocation_end; >> /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 88 | 1 */ bool _caller_must_gc_arguments; >> /* 89 | 1 */ bool _is_compiled; >> /* XXX 6-byte hole */ >> /* 96 | 8 */ const char *_name; >> /* 104 | 8 */ class AsmRemarks { >> /* 104 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 112 | 8 */ class DbgStrings { >> /* 112 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 120 */ >> } >> >> AFTER: >> >> (gdb) ptype /o CodeBlob >> /* offset | size */ type = class CodeBlob { >> protected: >> /* 8 | 8 */ address _code_begin; >> /* 16 | 8 */ address _code_end; >> /* 24 | 8 */ address _content_begin; >> /* 32 | 8 */ address _data_end; >> /* 40 | 8 */ address _relocation_begin; >> /* 48 | 8 */ address _relocation_end; >> /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; >> /* 64 | 8 */ const char *_name; >> /* 72 | 4 */ int _size; >> /* 76 | 4 */ int _header_size; >> /* 80 | 4 */ int _frame_complete_offset; >> /* 84 | 4 */ int _data_offset; >> /* 88 | 4 */ int _frame_size; >> /* 92 | 1 */ bool _caller_must_gc_arguments; >> /* 93 | 1 */ bool _is_compiled; >> /* 94 | 1 */ const CompilerType _type; <<<< >> /* XXX 1-byte hole */ >> /* 96 | 8 */ class AsmRemarks { >> /* 96 | 8 */ AsmRemarkCollection *_remarks; >> } _asm_remarks; >> /* 104 | 8 */ class DbgStrings { >> /* 104 | 8 */ DbgStringCollection *_strings; >> } _dbg_strings; >> >> /* total size (bytes): 112 */ >> } >> >> BEFORE: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> private: >> /* 208 | 4 */ int _entry_bci; >> /* XXX 4-byte hole */ >> /* 216 | 8 */ uint64_t _gc_epoch; >> /* 224 | 8 */ nmethod *_osr_link; >> /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 240 | 8 */ address _entry_point; >> /* 248 | 8 */ address _verified_entry_point; >> /* 256 | 8 */ address _osr_entry_point; >> /* 264 | 4 */ int _exception_offset; >> /* 268 | 4 */ int _unwind_handler_offset; >> /* 272 | 4 */ int _consts_offset; >> /* 276 | 4 */ int _stub_offset; >> /* 280 | 4 */ int _oops_offset; >> /* 284 | 4 */ int _metadata_offset; >> /* 288 | 4 */ int _scopes_data_offset; >> /* 292 | 4 */ int _scopes_pcs_offset; >> /* 296 | 4 */ int _dependencies_offset; >> /* 300 | 4 */ int _handler_table_offset; >> /* 304 | 4 */ int _nul_chk_table_offset; >> /* 308 | 4 */ int _speculations_offset; >> /* 312 | 4 */ int _jvmci_data_offset; >> /* 316 | 4 */ int _nmethod_end_offset; >> /* 320 | 4 */ int _orig_pc_offset; >> /* 324 | 4 */ int _compile_id; >> /* 328 | 4 */ int _comp_level; <<<< >> /* 332 | 1 */ bool _has_flushed_dependencies; >> /* 333 | 1 */ bool _unload_reported; >> /* 334 | 1 */ bool _load_reported; >> /* 335 | 1 */ volatile signed char _state; >> /* 336 | 1 */ bool _oops_are_stale; >> /* XXX 3-byte hole */ >> /* 340 | 4 */ RTMState _rtm_state; >> /* 344 | 4 */ volatile jint _lock_count; >> /* XXX 4-byte hole */ >> /* 352 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 360 | 4 */ int _hotness_counter; >> /* 364 | 1 */ volatile uint8_t _is_unloading_state; >> /* XXX 3-byte hole */ >> /* 368 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; >> >> /* total size (bytes): 376 */ >> } >> >> AFTER: >> >> (gdb) ptype /o nmethod >> /* offset | size */ type = class nmethod : public CompiledMethod { >> /* 200 | 8 */ uint64_t _gc_epoch; >> /* 208 | 8 */ volatile int64_t _stack_traversal_mark; >> /* 216 | 8 */ nmethod *_osr_link; >> /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; >> /* 232 | 8 */ address _entry_point; >> /* 240 | 8 */ address _verified_entry_point; >> /* 248 | 8 */ address _osr_entry_point; >> /* 256 | 4 */ int _entry_bci; >> /* 260 | 4 */ int _exception_offset; >> /* 264 | 4 */ int _unwind_handler_offset; >> /* 268 | 4 */ int _consts_offset; >> /* 272 | 4 */ int _stub_offset; >> /* 276 | 4 */ int _oops_offset; >> /* 280 | 4 */ int _metadata_offset; >> /* 284 | 4 */ int _scopes_data_offset; >> /* 288 | 4 */ int _scopes_pcs_offset; >> /* 292 | 4 */ int _dependencies_offset; >> /* 296 | 4 */ int _handler_table_offset; >> /* 300 | 4 */ int _nul_chk_table_offset; >> /* 304 | 4 */ int _speculations_offset; >> /* 308 | 4 */ int _jvmci_data_offset; >> /* 312 | 4 */ int _nmethod_end_offset; >> /* 316 | 4 */ int _orig_pc_offset; >> /* 320 | 4 */ int _compile_id; >> /* 324 | 4 */ RTMState _rtm_state; >> /* 328 | 4 */ volatile jint _lock_count; >> /* 332 | 4 */ int _hotness_counter; >> /* 336 | 4 */ ByteSize _native_receiver_sp_offset; >> /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; >> /* 344 | 1 */ CompLevel _comp_level; <<<< >> /* 345 | 1 */ volatile uint8_t _is_unloading_state; >> /* 346 | 1 */ bool _has_flushed_dependencies; >> /* 347 | 1 */ bool _unload_reported; >> /* 348 | 1 */ bool _load_reported; >> /* 349 | 1 */ volatile signed char _state; >> /* 350 | 1 */ bool _oops_are_stale; >> >> /* total size (bytes): 352 */ >> } > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > declare CompLevel properly in vmStructs Thank you all! ------------- PR: https://git.openjdk.org/jdk/pull/9165 From bulasevich at openjdk.org Thu Jul 28 19:52:14 2022 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 28 Jul 2022 19:52:14 GMT Subject: Integrated: 8288477: nmethod header size reduction In-Reply-To: References: Message-ID: On Wed, 15 Jun 2022 09:30:59 GMT, Boris Ulasevich wrote: > Each compiled method contains an nmethod header. In trivial case, the header takes up half the method payload: ~350 bytes. Over time, the header gets bigger. With this change, I suggest sorting the header data fields from largest to smallest to minimize header paddings, and using one byte for the CompilerType and CompLevel values. > > Cleanup work: apply CompLevel type where applicable. > > The change tested with jtreg tier1-3, :hotspot_compiler :hotspot_gc :hotspot_serviceability :hotspot_runtime > > Renaissance benchmarks shows no performance regressions on x86 and aarch. > > BEFORE: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > /* 8 | 4 */ const CompilerType _type; <<<< > /* 12 | 4 */ int _size; > /* 16 | 4 */ int _header_size; > /* 20 | 4 */ int _frame_complete_offset; > /* 24 | 4 */ int _data_offset; > /* 28 | 4 */ int _frame_size; > /* 32 | 8 */ address _code_begin; > /* 40 | 8 */ address _code_end; > /* 48 | 8 */ address _content_begin; > /* 56 | 8 */ address _data_end; > /* 64 | 8 */ address _relocation_begin; > /* 72 | 8 */ address _relocation_end; > /* 80 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 88 | 1 */ bool _caller_must_gc_arguments; > /* 89 | 1 */ bool _is_compiled; > /* XXX 6-byte hole */ > /* 96 | 8 */ const char *_name; > /* 104 | 8 */ class AsmRemarks { > /* 104 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 112 | 8 */ class DbgStrings { > /* 112 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 120 */ > } > > AFTER: > > (gdb) ptype /o CodeBlob > /* offset | size */ type = class CodeBlob { > protected: > /* 8 | 8 */ address _code_begin; > /* 16 | 8 */ address _code_end; > /* 24 | 8 */ address _content_begin; > /* 32 | 8 */ address _data_end; > /* 40 | 8 */ address _relocation_begin; > /* 48 | 8 */ address _relocation_end; > /* 56 | 8 */ ImmutableOopMapSet *_oop_maps; > /* 64 | 8 */ const char *_name; > /* 72 | 4 */ int _size; > /* 76 | 4 */ int _header_size; > /* 80 | 4 */ int _frame_complete_offset; > /* 84 | 4 */ int _data_offset; > /* 88 | 4 */ int _frame_size; > /* 92 | 1 */ bool _caller_must_gc_arguments; > /* 93 | 1 */ bool _is_compiled; > /* 94 | 1 */ const CompilerType _type; <<<< > /* XXX 1-byte hole */ > /* 96 | 8 */ class AsmRemarks { > /* 96 | 8 */ AsmRemarkCollection *_remarks; > } _asm_remarks; > /* 104 | 8 */ class DbgStrings { > /* 104 | 8 */ DbgStringCollection *_strings; > } _dbg_strings; > > /* total size (bytes): 112 */ > } > > BEFORE: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > private: > /* 208 | 4 */ int _entry_bci; > /* XXX 4-byte hole */ > /* 216 | 8 */ uint64_t _gc_epoch; > /* 224 | 8 */ nmethod *_osr_link; > /* 232 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 240 | 8 */ address _entry_point; > /* 248 | 8 */ address _verified_entry_point; > /* 256 | 8 */ address _osr_entry_point; > /* 264 | 4 */ int _exception_offset; > /* 268 | 4 */ int _unwind_handler_offset; > /* 272 | 4 */ int _consts_offset; > /* 276 | 4 */ int _stub_offset; > /* 280 | 4 */ int _oops_offset; > /* 284 | 4 */ int _metadata_offset; > /* 288 | 4 */ int _scopes_data_offset; > /* 292 | 4 */ int _scopes_pcs_offset; > /* 296 | 4 */ int _dependencies_offset; > /* 300 | 4 */ int _handler_table_offset; > /* 304 | 4 */ int _nul_chk_table_offset; > /* 308 | 4 */ int _speculations_offset; > /* 312 | 4 */ int _jvmci_data_offset; > /* 316 | 4 */ int _nmethod_end_offset; > /* 320 | 4 */ int _orig_pc_offset; > /* 324 | 4 */ int _compile_id; > /* 328 | 4 */ int _comp_level; <<<< > /* 332 | 1 */ bool _has_flushed_dependencies; > /* 333 | 1 */ bool _unload_reported; > /* 334 | 1 */ bool _load_reported; > /* 335 | 1 */ volatile signed char _state; > /* 336 | 1 */ bool _oops_are_stale; > /* XXX 3-byte hole */ > /* 340 | 4 */ RTMState _rtm_state; > /* 344 | 4 */ volatile jint _lock_count; > /* XXX 4-byte hole */ > /* 352 | 8 */ volatile int64_t _stack_traversal_mark; > /* 360 | 4 */ int _hotness_counter; > /* 364 | 1 */ volatile uint8_t _is_unloading_state; > /* XXX 3-byte hole */ > /* 368 | 4 */ ByteSize _native_receiver_sp_offset; > /* 372 | 4 */ ByteSize _native_basic_lock_sp_offset; > > /* total size (bytes): 376 */ > } > > AFTER: > > (gdb) ptype /o nmethod > /* offset | size */ type = class nmethod : public CompiledMethod { > /* 200 | 8 */ uint64_t _gc_epoch; > /* 208 | 8 */ volatile int64_t _stack_traversal_mark; > /* 216 | 8 */ nmethod *_osr_link; > /* 224 | 8 */ nmethod::oops_do_mark_link * volatile _oops_do_mark_link; > /* 232 | 8 */ address _entry_point; > /* 240 | 8 */ address _verified_entry_point; > /* 248 | 8 */ address _osr_entry_point; > /* 256 | 4 */ int _entry_bci; > /* 260 | 4 */ int _exception_offset; > /* 264 | 4 */ int _unwind_handler_offset; > /* 268 | 4 */ int _consts_offset; > /* 272 | 4 */ int _stub_offset; > /* 276 | 4 */ int _oops_offset; > /* 280 | 4 */ int _metadata_offset; > /* 284 | 4 */ int _scopes_data_offset; > /* 288 | 4 */ int _scopes_pcs_offset; > /* 292 | 4 */ int _dependencies_offset; > /* 296 | 4 */ int _handler_table_offset; > /* 300 | 4 */ int _nul_chk_table_offset; > /* 304 | 4 */ int _speculations_offset; > /* 308 | 4 */ int _jvmci_data_offset; > /* 312 | 4 */ int _nmethod_end_offset; > /* 316 | 4 */ int _orig_pc_offset; > /* 320 | 4 */ int _compile_id; > /* 324 | 4 */ RTMState _rtm_state; > /* 328 | 4 */ volatile jint _lock_count; > /* 332 | 4 */ int _hotness_counter; > /* 336 | 4 */ ByteSize _native_receiver_sp_offset; > /* 340 | 4 */ ByteSize _native_basic_lock_sp_offset; > /* 344 | 1 */ CompLevel _comp_level; <<<< > /* 345 | 1 */ volatile uint8_t _is_unloading_state; > /* 346 | 1 */ bool _has_flushed_dependencies; > /* 347 | 1 */ bool _unload_reported; > /* 348 | 1 */ bool _load_reported; > /* 349 | 1 */ volatile signed char _state; > /* 350 | 1 */ bool _oops_are_stale; > > /* total size (bytes): 352 */ > } This pull request has now been integrated. Changeset: e052d7f4 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/e052d7f4bc0af26e205dbfff07beb06feebf1806 Stats: 135 lines in 11 files changed: 62 ins; 55 del; 18 mod 8288477: nmethod header size reduction Reviewed-by: kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/9165 From kbarrett at openjdk.org Thu Jul 28 20:08:45 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 28 Jul 2022 20:08:45 GMT Subject: RFR: 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing() [v3] In-Reply-To: References: Message-ID: On Thu, 28 Jul 2022 10:15:41 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that fixes some callers of `G1CollectedHeap::heap_region_containing` to a) fail on trying to get the `Heapregion*` on uncommitted regions and b) change some callers to use the correct `heap_region_containing_or_null` method that had been intended there. >> >> Also remove some now unneccessary asserts as `heap_region_containing` will fail when it previously returned `nullptr`. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into 8290715-heapregion-containing-no-null > - Some more removal of unnecessary checks > - Initial changes > - Fix the places where the wrong heap_region_containing() method has been used > - kbarrett review > - Remove some more casts > - Revert changes to heap_region_containing() > - Remove cast > - Remove unnecessary changes > - some changes > - ... and 1 more: https://git.openjdk.org/jdk/compare/97fc8deb...5dc2fc08 Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/9584 From sangheki at openjdk.org Thu Jul 28 20:23:32 2022 From: sangheki at openjdk.org (Sangheon Kim) Date: Thu, 28 Jul 2022 20:23:32 GMT Subject: RFR: 8290966: G1: Record number of PLAB filled and number of direct allocations In-Reply-To: References: Message-ID: On Mon, 25 Jul 2022 14:11:24 GMT, Thomas Schatzl wrote: > Hi all, > > for evaluation in [JDK-8288966](https://bugs.openjdk.org/browse/JDK-8288966) I added statistics output that show the amount of PLAB fills and direct allocations; I think this is useful for similar evaluations in the future, so I kept and split it out from that change. Adds these values to the existing JFR event too. > > Testing: PLAB related tests, gha > > Thanks, > Thomas Lgtm. ------------- Marked as reviewed by sangheki (Reviewer). PR: https://git.openjdk.org/jdk/pull/9626 From kbarrett at openjdk.org Thu Jul 28 20:32:33 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 28 Jul 2022 20:32:33 GMT Subject: RFR: 8290966: G1: Record number of PLAB filled and number of direct allocations In-Reply-To: References: Message-ID: On Mon, 25 Jul 2022 14:11:24 GMT, Thomas Schatzl wrote: > Hi all, > > for evaluation in [JDK-8288966](https://bugs.openjdk.org/browse/JDK-8288966) I added statistics output that show the amount of PLAB fills and direct allocations; I think this is useful for similar evaluations in the future, so I kept and split it out from that change. Adds these values to the existing JFR event too. > > Testing: PLAB related tests, gha > > Thanks, > Thomas Looks good. Just a couple whitespace comments. src/hotspot/share/gc/shared/gcHeapSummary.hpp line 194: > 192: G1EvacSummary(size_t allocated, size_t wasted, size_t undo_wasted, size_t unused, > 193: size_t used, size_t region_end_waste, uint regions_filled, size_t num_plab_filled, > 194: size_t direct_allocated, size_t num_direct_allocated, size_t failure_used, size_t failure_waste) : [pre-existing] Shouldn't these parameters be indented to the parameter list start? As it is, it's hard to spot where the parameters end and the initializers start. src/hotspot/share/gc/shared/gcHeapSummary.hpp line 197: > 195: _allocated(allocated), _wasted(wasted), _undo_wasted(undo_wasted), _unused(unused), > 196: _used(used), _region_end_waste(region_end_waste), _regions_filled(regions_filled), > 197: _num_plab_filled(num_plab_filled), _direct_allocated(direct_allocated), _num_direct_allocated(num_direct_allocated), Maybe another line break in this line, to keep it's length a bit more reasonable? ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/9626 From duke at openjdk.org Fri Jul 29 03:47:34 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 29 Jul 2022 03:47:34 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: References: <0TH2Cv2t4pTvoEZ9c4MLAtMqTWF2_tHYwFq-Z_pmbbQ=.5ab14625-26c7-4cb8-914e-51b3059a69fb@github.com> Message-ID: On Thu, 28 Jul 2022 18:17:27 GMT, Jatin Bhateja wrote: >> I have remeasured the statement, it seems that only the non-vex encoding version receives the special dependency treatment, so I reverted this change and added a comment for clarification. >> >> The optimisation can be found noticed in [The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers](https://www.agner.org/optimize/) on several architectures such as in section 9.8 (Register allocation and renaming in Sandy Bridge and Ivy Bridge pipeline). >> >> I have performed measurements on uica.uops.info . While this sequence gives 1.37 cycles/iteration on Skylake and Icelake >> >> pcmpeqd xmm0, xmm0 >> paddd xmm0, xmm1 >> paddd xmm0, xmm1 >> paddd xmm0, xmm1 >> >> This version has the throughput of 4 cycles/iteration >> >> vpcmpeqd xmm0, xmm0, xmm0 >> vpaddd xmm0, xmm1, xmm0 >> vpaddd xmm0, xmm1, xmm0 >> vpaddd xmm0, xmm1, xmm0 >> >> Which indicates the `vpcmpeqd` failing to break dependencies on `xmm0` as opposed to the `pcmpeqd` instruction. >> >> Thanks. > > Both the above JIT sequences have true dependency chain, there is no scope of any additional architecture imposed false dependency doing any further perf degradation for which we use dep-breaking idioms. I'm sorry I don't quite understand what do you mean here, what I meant is that while `pcmpeqd xmmk, xmmk` is a dep-breaking idiom, `vpcmpeqd xmmk, xmmk, xmmk` seems to not be. As a result, I reverted that change and in this context, the only change is I added a branch for non-AVX machines. Please have a review for this patch. Thank you very much. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jbhateja at openjdk.org Fri Jul 29 05:20:38 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 29 Jul 2022 05:20:38 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v8] In-Reply-To: References: <0TH2Cv2t4pTvoEZ9c4MLAtMqTWF2_tHYwFq-Z_pmbbQ=.5ab14625-26c7-4cb8-914e-51b3059a69fb@github.com> Message-ID: On Fri, 29 Jul 2022 03:44:16 GMT, Quan Anh Mai wrote: >> Both the above JIT sequences have true dependency chain, there is no scope of any additional architecture imposed false dependency doing any further perf degradation for which we use dep-breaking idioms. > > I'm sorry I don't quite understand what do you mean here, what I meant is that while `pcmpeqd xmmk, xmmk` is a dep-breaking idiom, `vpcmpeqd xmmk, xmmk, xmmk` seems to not be. As a result, I reverted that change and in this context, the only change is I added a branch for non-AVX machines. Please have a review for this patch. Thank you very much. Yes, its a valid one-idiom and as per section E.1.2 of [X86 Optimization manual](https://cdrdv2.intel.com/v1/dl/getContent/671488) such idioms are resolved by renamer and does not reach execution ports. I faintly remember that there was a subtle difference b/w handling of zeroing/one idioms on certain targets where in some cases one-idioms still go beyond renamer. But, we can keep this change of your since even if all-one idiom (vpcmpeqd) reach execution port, latency wise it's same as vpternlog over 256 bit vector. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jbhateja at openjdk.org Fri Jul 29 08:27:47 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 29 Jul 2022 08:27:47 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: Message-ID: On Wed, 27 Jul 2022 09:40:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > unnecessary TEMP dst src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1651: > 1649: case 32: vmovdqu(dst, src); break; > 1650: case 64: evmovdqul(dst, src, Assembler::AVX_512bit); break; > 1651: default: ShouldNotReachHere(); No change in this file, may be you can remove it from change set. src/hotspot/cpu/x86/x86.ad line 4141: > 4139: instruct ReplB_mem(vec dst, memory mem) %{ > 4140: predicate(VM_Version::supports_avx2()); > 4141: match(Set dst (ReplicateB (LoadB mem))); Merge these rules and create a macro assembly routine for encoding block logic. src/hotspot/cpu/x86/x86.ad line 4159: > 4157: > 4158: instruct vReplS_reg(vec dst, rRegI src) %{ > 4159: predicate(UseAVX >= 2); Can be folded with below pattern, by pushing predicate into encoding block. src/hotspot/cpu/x86/x86.ad line 4188: > 4186: assert(vlen == 8, ""); > 4187: __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); > 4188: } Please move this into macro assembly routine, it will look cleaner that way, after merging with above rule. src/hotspot/cpu/x86/x86.ad line 4253: > 4251: int vlen_enc = vector_length_encoding(this); > 4252: if (VM_Version::supports_avx()) { > 4253: __ vbroadcastss($dst$$XMMRegister, addr, vlen_enc); Emitting vbroadcastss for all the vector sizes for Replicate[B/S/I] may result into domain switch over penalty, can be limited to only <=16 bytes replications and above that we can emit VPBROADCASTD. src/hotspot/cpu/x86/x86.ad line 4261: > 4259: __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); > 4260: } > 4261: } Please move into a new macro-assembly routine. src/hotspot/cpu/x86/x86.ad line 4407: > 4405: __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); > 4406: } > 4407: } Please move to a new macro assembly routine. src/hotspot/cpu/x86/x86.ad line 4497: > 4495: __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); > 4496: } > 4497: } Same as above. src/hotspot/cpu/x86/x86.ad line 4541: > 4539: instruct ReplD_reg(vec dst, vlRegD src) %{ > 4540: predicate(UseSSE < 3); > 4541: match(Set dst (ReplicateD src)); Pushing predicates into encoding can fold these patterns. src/hotspot/cpu/x86/x86.ad line 4579: > 4577: if (Matcher::vector_length_in_bytes(this) >= 16) { > 4578: __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); > 4579: } Macro-assembly routine. src/hotspot/share/opto/machnode.cpp line 478: > 476: // Stretching lots of inputs - don't do it. > 477: // A MachContant has the last input being the constant base > 478: if (req() > (is_MachConstant() ? 3U : 2U)) { Earlier some of the nodes like add/sub/mul/divF_imm which were carrying 3 inputs were not getting cloned, now with change we may see them getting rematerialized before uses which may increase code size but of course it will reduced interferences. With earlier cap of 2 only Replicates were passing this check. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jbhateja at openjdk.org Fri Jul 29 08:27:49 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 29 Jul 2022 08:27:49 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: Message-ID: <-u0cr_joNl-C5Zu_27nHrdsDFqrUGKo0ygD32PhAwJU=.cd688fed-5606-4a92-b51e-d0cd99fffa6d@github.com> On Fri, 29 Jul 2022 07:51:04 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> unnecessary TEMP dst > > src/hotspot/share/opto/machnode.cpp line 478: > >> 476: // Stretching lots of inputs - don't do it. >> 477: // A MachContant has the last input being the constant base >> 478: if (req() > (is_MachConstant() ? 3U : 2U)) { > > Earlier some of the nodes like add/sub/mul/divF_imm which were carrying 3 inputs were not getting cloned, now with change we may see them getting rematerialized before uses which may increase code size but of course it will reduced interferences. With earlier cap of 2 only Replicates were passing this check. Saving a spill at the cost of re-materialization using a comparatively cheaper instruction like add/sub/mul looks better for divD may be costly. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jbhateja at openjdk.org Fri Jul 29 08:27:49 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 29 Jul 2022 08:27:49 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: <-u0cr_joNl-C5Zu_27nHrdsDFqrUGKo0ygD32PhAwJU=.cd688fed-5606-4a92-b51e-d0cd99fffa6d@github.com> References: <-u0cr_joNl-C5Zu_27nHrdsDFqrUGKo0ygD32PhAwJU=.cd688fed-5606-4a92-b51e-d0cd99fffa6d@github.com> Message-ID: On Fri, 29 Jul 2022 08:00:21 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/machnode.cpp line 478: >> >>> 476: // Stretching lots of inputs - don't do it. >>> 477: // A MachContant has the last input being the constant base >>> 478: if (req() > (is_MachConstant() ? 3U : 2U)) { >> >> Earlier some of the nodes like add/sub/mul/divF_imm which were carrying 3 inputs were not getting cloned, now with change we may see them getting rematerialized before uses which may increase code size but of course it will reduced interferences. With earlier cap of 2 only Replicates were passing this check. > > Saving a spill at the cost of re-materialization using a comparatively cheaper instruction like add/sub/mul looks better for divD may be costly. There are other machine nodes which just accept constants as a mode, like vround* and vcompu* nodes which will now qualify for rematerlization leading to emitting high cost instructions. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jbhateja at openjdk.org Fri Jul 29 08:27:50 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 29 Jul 2022 08:27:50 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: <-u0cr_joNl-C5Zu_27nHrdsDFqrUGKo0ygD32PhAwJU=.cd688fed-5606-4a92-b51e-d0cd99fffa6d@github.com> Message-ID: On Fri, 29 Jul 2022 08:15:11 GMT, Jatin Bhateja wrote: >> Saving a spill at the cost of re-materialization using a comparatively cheaper instruction like add/sub/mul looks better for divD may be costly. > > There are other machine nodes which just accept constants as a mode, like vround* and vcompu* nodes which will now qualify for rematerlization leading to emitting high cost instructions. I think we should have a rough cost model here and not just basing it purely over connectivity of the node, or for the time being you can remove this change ? ------------- PR: https://git.openjdk.org/jdk/pull/7832 From sangheki at openjdk.org Fri Jul 29 13:40:48 2022 From: sangheki at openjdk.org (Sangheon Kim) Date: Fri, 29 Jul 2022 13:40:48 GMT Subject: RFR: 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing() [v3] In-Reply-To: References: Message-ID: On Thu, 28 Jul 2022 10:15:41 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that fixes some callers of `G1CollectedHeap::heap_region_containing` to a) fail on trying to get the `Heapregion*` on uncommitted regions and b) change some callers to use the correct `heap_region_containing_or_null` method that had been intended there. >> >> Also remove some now unneccessary asserts as `heap_region_containing` will fail when it previously returned `nullptr`. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into 8290715-heapregion-containing-no-null > - Some more removal of unnecessary checks > - Initial changes > - Fix the places where the wrong heap_region_containing() method has been used > - kbarrett review > - Remove some more casts > - Revert changes to heap_region_containing() > - Remove cast > - Remove unnecessary changes > - some changes > - ... and 1 more: https://git.openjdk.org/jdk/compare/97fc8deb...5dc2fc08 Looks good. ------------- Marked as reviewed by sangheki (Reviewer). PR: https://git.openjdk.org/jdk/pull/9584 From duke at openjdk.org Fri Jul 29 13:48:07 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 29 Jul 2022 13:48:07 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v13] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the generation of broadcasting a scalar in several ways: > > - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. > - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. > - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay > > With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: > > Before After > Benchmark Mode Cnt Score Error Score Error Units Gain > SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% > SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% > SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% > SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% > > As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. > > This patch also removes some redundant code paths and renames some incorrectly named instructions. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add load_constant_vector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7832/files - new: https://git.openjdk.org/jdk/pull/7832/files/bc01c21b..e83ccaab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7832&range=11-12 Stats: 80 lines in 3 files changed: 35 ins; 36 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/7832.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7832/head:pull/7832 PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Fri Jul 29 13:48:13 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 29 Jul 2022 13:48:13 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 05:24:19 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> unnecessary TEMP dst > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1651: > >> 1649: case 32: vmovdqu(dst, src); break; >> 1650: case 64: evmovdqul(dst, src, Assembler::AVX_512bit); break; >> 1651: default: ShouldNotReachHere(); > > No change in this file, may be you can remove it from change set. Since I added the method `C2_MacroAssembler::load_constant_vector` near here anyway I think this style change can be kept. > src/hotspot/cpu/x86/x86.ad line 4159: > >> 4157: >> 4158: instruct vReplS_reg(vec dst, rRegI src) %{ >> 4159: predicate(UseAVX >= 2); > > Can be folded with below pattern, by pushing predicate into encoding block. Aligning the predicate of the reg and the mem version allows the adlc parser to recognise their relationship and during register allocation can substitute a reg operation with a spilt operand with its corresponding mem node. You can see in the generated code the reg node has specific methods such as `cisc_operand` and `cisc_version` > src/hotspot/cpu/x86/x86.ad line 4253: > >> 4251: int vlen_enc = vector_length_encoding(this); >> 4252: if (VM_Version::supports_avx()) { >> 4253: __ vbroadcastss($dst$$XMMRegister, addr, vlen_enc); > > Emitting vbroadcastss for all the vector sizes for Replicate[B/S/I] may result into domain switch over penalty, can be limited to only <=16 bytes replications and above that we can emit VPBROADCASTD. Got it > src/hotspot/cpu/x86/x86.ad line 4261: > >> 4259: __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister); >> 4260: } >> 4261: } > > Please move into a new macro-assembly routine. Done ------------- PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Fri Jul 29 14:00:28 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 29 Jul 2022 14:00:28 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v2] In-Reply-To: References: <1FBk3MauXFxUsyHz9kuhqGI-CtLRgHYmHn1eyyaDLvs=.6d4d94b0-32a0-42dc-a181-87df8d8f3b65@github.com> Message-ID: On Wed, 16 Mar 2022 17:25:53 GMT, Jatin Bhateja wrote: >>> Hi, forwarding results within the same bypass domain does not result in delay, data bypass delay happens when the data crosses different domains, according to "Intel? 64 and IA-32 Architectures Optimization Reference Manual" >>> >>> > When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a delay can occur. The delay occurs also for transitions between Intel SSE integer and Intel SSE floating-point operations. In some of the cases, the data transition is done using a micro-op that is added to the instruction flow. >>> >>> The manual mentions the guideline at section 3.5.2.2 >>> >>> ![image](https://user-images.githubusercontent.com/49088128/158618209-c0674ba7-1c93-4014-a7e1-330f4e5846da.png) >>> >>> Thanks. >> >> Thanks meant to refer to above text. I have removed incorrect reference. > >> > Hi, forwarding results within the same bypass domain does not result in delay, data bypass delay happens when the data crosses different domains, according to "Intel? 64 and IA-32 Architectures Optimization Reference Manual" >> > > When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a delay can occur. The delay occurs also for transitions between Intel SSE integer and Intel SSE floating-point operations. In some of the cases, the data transition is done using a micro-op that is added to the instruction flow. >> > >> > >> > The manual mentions the guideline at section 3.5.2.2 >> > ![image](https://user-images.githubusercontent.com/49088128/158618209-c0674ba7-1c93-4014-a7e1-330f4e5846da.png) >> > Thanks. >> >> Thanks meant to refer to above text. I have removed incorrect reference. > > It will still be good if we can come up with a micro benchmark, that shows the gain with the patch. @jatin-bhateja Thanks a lot for your comments, I have addressed those in the last commit. @vnkozlov Thanks very much for the review and testing. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From duke at openjdk.org Fri Jul 29 14:00:29 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 29 Jul 2022 14:00:29 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: <-u0cr_joNl-C5Zu_27nHrdsDFqrUGKo0ygD32PhAwJU=.cd688fed-5606-4a92-b51e-d0cd99fffa6d@github.com> Message-ID: On Fri, 29 Jul 2022 08:17:23 GMT, Jatin Bhateja wrote: >> There are other machine nodes which just accept constants as a mode, like vround* and vcompu* nodes which will now qualify for rematerlization leading to emitting high cost instructions. > > I think we should have a rough cost model here and not just basing it purely over connectivity of the node, or for the time being you can remove this change ? A node being decided to prefer rematerialising to spilling has to satisfy that: - The node is not explicitly said to be expensive, `divD` and `divF` fails at this stage. - The node declaration only contains simple register rules (explicit or implicit DEF dst and USE src), `vround` fails this because it has temp register, `cmpF_imm` and `cmpD_imm` fail this because they kill flags. - This method we are at agrees with the rematerialising. I have looked at all instances where `constantaddress` is used and found no node where accidental rematerialisation is inefficient. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From tschatzl at openjdk.org Fri Jul 29 15:44:39 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 29 Jul 2022 15:44:39 GMT Subject: RFR: 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing() [v3] In-Reply-To: References: Message-ID: On Thu, 28 Jul 2022 20:04:43 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into 8290715-heapregion-containing-no-null >> - Some more removal of unnecessary checks >> - Initial changes >> - Fix the places where the wrong heap_region_containing() method has been used >> - kbarrett review >> - Remove some more casts >> - Revert changes to heap_region_containing() >> - Remove cast >> - Remove unnecessary changes >> - some changes >> - ... and 1 more: https://git.openjdk.org/jdk/compare/97fc8deb...5dc2fc08 > > Looks good. Thanks @kimbarrett @sangheon for your reviews. ------------- PR: https://git.openjdk.org/jdk/pull/9584 From tschatzl at openjdk.org Fri Jul 29 15:48:43 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 29 Jul 2022 15:48:43 GMT Subject: Integrated: 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing() In-Reply-To: References: Message-ID: <9hC86H3VKrQ6Jg5TmS4yZNBoqExANdv2i7tUvlORwxE=.1e655863-0d3a-42e7-bfc1-098c837a35b6@github.com> On Thu, 21 Jul 2022 08:06:04 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that fixes some callers of `G1CollectedHeap::heap_region_containing` to a) fail on trying to get the `Heapregion*` on uncommitted regions and b) change some callers to use the correct `heap_region_containing_or_null` method that had been intended there. > > Also remove some now unneccessary asserts as `heap_region_containing` will fail when it previously returned `nullptr`. > > Testing: tier1-5 > > Thanks, > Thomas This pull request has now been integrated. Changeset: f58e08e2 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/f58e08e2585186e1b3ca2cad20b342d83a8ab133 Stats: 23 lines in 5 files changed: 1 ins; 8 del; 14 mod 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing() Reviewed-by: kbarrett, sangheki ------------- PR: https://git.openjdk.org/jdk/pull/9584 From hseigel at openjdk.org Fri Jul 29 18:12:38 2022 From: hseigel at openjdk.org (Harold Seigel) Date: Fri, 29 Jul 2022 18:12:38 GMT Subject: RFR: 8291360: Create entry points to expose low-level class file information Message-ID: Please review this change to fix JDK-8291360. This fix adds entry points getClassFileVersion() and getClassAccessFlagsRaw() to class java.lang.Class. The new entry points return the current class's class file version and its raw access flags. The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 1-3 on Linux x64. Additionally, the JCK lang, vm, and api tests and new regression tests were run locally on Linux x64. Thanks, Harold ------------- Commit messages: - 8291360: Create entry points to expose low-level class file information Changes: https://git.openjdk.org/jdk/pull/9688/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9688&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291360 Stats: 704 lines in 9 files changed: 703 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9688.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9688/head:pull/9688 PR: https://git.openjdk.org/jdk/pull/9688 From darcy at openjdk.org Fri Jul 29 18:15:15 2022 From: darcy at openjdk.org (Joe Darcy) Date: Fri, 29 Jul 2022 18:15:15 GMT Subject: RFR: 8291360: Create entry points to expose low-level class file information In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 18:02:46 GMT, Harold Seigel wrote: > Please review this change to fix JDK-8291360. This fix adds entry points getClassFileVersion() and getClassAccessFlagsRaw() to class java.lang.Class. The new entry points return the current class's class file version and its raw access flags. > > The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 1-3 on Linux x64. Additionally, the JCK lang, vm, and api tests and new regression tests were run locally on Linux x64. > > Thanks, Harold src/java.base/share/classes/java/lang/Class.java line 4700: > 4698: * returned. If the class is a primitive then ACC_ABSTRACT | ACC_FINAL | ACC_PUBLIC. > 4699: */ > 4700: private int getClassAccessFlagsRaw() { For a "raw" method, it might be better to return the flags on the array class object itself rather than loop down to the component type. ------------- PR: https://git.openjdk.org/jdk/pull/9688 From jbhateja at openjdk.org Fri Jul 29 19:03:41 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 29 Jul 2022 19:03:41 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v13] In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 13:48:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add load_constant_vector Marked as reviewed by jbhateja (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jbhateja at openjdk.org Fri Jul 29 19:03:44 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 29 Jul 2022 19:03:44 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: Message-ID: <6iKZmw4KDQl0pDrDHsfwotJgb4I_wpVSPKVjXcn9eHI=.125bb12d-b8eb-43b1-9591-68b1bc445e70@github.com> On Fri, 29 Jul 2022 13:39:31 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/x86.ad line 4159: >> >>> 4157: >>> 4158: instruct vReplS_reg(vec dst, rRegI src) %{ >>> 4159: predicate(UseAVX >= 2); >> >> Can be folded with below pattern, by pushing predicate into encoding block. > > Aligning the predicate of the reg and the mem version allows the adlc parser to recognise their relationship and during register allocation can substitute a reg operation with a spilt operand with its corresponding mem node. You can see in the generated code the reg node has specific methods such as `cisc_operand` and `cisc_version` May be a misplaced comment, what I meant was to collapse patterns if number and register class of operands comply. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From jbhateja at openjdk.org Fri Jul 29 19:03:46 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 29 Jul 2022 19:03:46 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v12] In-Reply-To: References: <-u0cr_joNl-C5Zu_27nHrdsDFqrUGKo0ygD32PhAwJU=.cd688fed-5606-4a92-b51e-d0cd99fffa6d@github.com> Message-ID: <4wrDjVMC9f4fR5Uue_RvEF8zQzhfZkswGVf67wXYTpM=.f93f9fc7-9ab4-4524-8287-8cb317fcf33e@github.com> On Fri, 29 Jul 2022 13:55:39 GMT, Quan Anh Mai wrote: >> I think we should have a rough cost model here and not just basing it purely over connectivity of the node, or for the time being you can remove this change ? > > A node being decided to prefer rematerialising to spilling has to satisfy that: > > - The node is not explicitly said to be expensive, `divD` and `divF` fails at this stage. > - The node declaration only contains simple register rules (explicit or implicit DEF dst and USE src), `vround` fails this because it has temp register, `cmpF_imm` and `cmpD_imm` fail this because they kill flags. > - This method we are at agrees with the rematerialising. > > I have looked at all instances where `constantaddress` is used and found no node where accidental rematerialisation is inefficient. Thanks for your explanations, I agree. ------------- PR: https://git.openjdk.org/jdk/pull/7832 From gziemski at openjdk.org Fri Jul 29 19:06:43 2022 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 29 Jul 2022 19:06:43 GMT Subject: RFR: 8290840: Refactor the "os" class [v2] In-Reply-To: <8vUXDokoOuVschkyCtu5BNpv8ZYIv01RllzKWhdZdXQ=.aa253ce1-7e3b-4efc-9b90-930a2a9b3ea5@github.com> References: <8vUXDokoOuVschkyCtu5BNpv8ZYIv01RllzKWhdZdXQ=.aa253ce1-7e3b-4efc-9b90-930a2a9b3ea5@github.com> Message-ID: <29bp6K9fT5FB6NuMaSRg9bKIp66HaXG_iza8hPSTENE=.c9487746-812c-45c3-aa01-0baa5ebb4483@github.com> On Mon, 25 Jul 2022 22:54:15 GMT, Ioi Lam wrote: >> Please see [JDK-8290840](https://bugs.openjdk.org/browse/JDK-8290840) for the detailed proposal. >> >> The `os` class, declared in os.hpp, forms the major part of the HotSpot porting interface. Its structure has gradually deteriorated over the years as new ports are created and new APIs are added. >> >> This RFE tries to address the following: >> >> - Clearly specify where a porting API should be declared and defined among the various `os*.cpp` and `os*.hpp` files. >> - Avoid the inappropriate inclusion of OS-specific APIs (such as the `os::Linux class`) by platform-independent source files. > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - moved os::{print_active_locale, print_user_info} to os_posix.cpp > - Fixed os.hpp comments per @dholmes-ora review Looks good, however, I would prefer not to have to provide the default implementations for `register_code_area()` or `resolve_function_descriptor()` src/hotspot/share/runtime/os.inline.hpp line 35: > 33: > 34: // Below are inline functions that are rarely implemented by the platforms. > 35: // Provide default emptyy implementation. Typo: emptyy --> empty src/hotspot/share/runtime/os.inline.hpp line 55: > 53: return true; > 54: } > 55: #endif We could use some custom define, similar to `HAVE_FUNCTION_DESCRIPTORS` and remove this default impl just like with `resolve_function_descriptor()`? src/hotspot/share/runtime/os.inline.hpp line 60: > 58: inline void* os::resolve_function_descriptor(void* p) { > 59: return NULL; > 60: } This is unnecessary and could be removed if we did: +#ifdef HAVE_FUNCTION_DESCRIPTORS // Used only on PPC. inline static void* resolve_function_descriptor(void* p); +#endif in `os.hpp` ------------- Changes requested by gziemski (Committer). PR: https://git.openjdk.org/jdk/pull/9600 From hseigel at openjdk.org Fri Jul 29 19:56:51 2022 From: hseigel at openjdk.org (Harold Seigel) Date: Fri, 29 Jul 2022 19:56:51 GMT Subject: RFR: 8291360: Create entry points to expose low-level class file information In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 18:12:22 GMT, Joe Darcy wrote: >> Please review this change to fix JDK-8291360. This fix adds entry points getClassFileVersion() and getClassAccessFlagsRaw() to class java.lang.Class. The new entry points return the current class's class file version and its raw access flags. >> >> The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 1-3 on Linux x64. Additionally, the JCK lang, vm, and api tests and new regression tests were run locally on Linux x64. >> >> Thanks, Harold > > src/java.base/share/classes/java/lang/Class.java line 4700: > >> 4698: * returned. If the class is a primitive then ACC_ABSTRACT | ACC_FINAL | ACC_PUBLIC. >> 4699: */ >> 4700: private int getClassAccessFlagsRaw() { > > For a "raw" method, it might be better to return the flags on the array class object itself rather than loop down to the component type. There's no bytecode stream for arrays. It's created using anewarray with a dimension operand and a cp pointer to the component type. So there are no flags for the array object. ------------- PR: https://git.openjdk.org/jdk/pull/9688 From iklam at openjdk.org Fri Jul 29 20:31:42 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 29 Jul 2022 20:31:42 GMT Subject: RFR: 8290840: Refactor the "os" class [v3] In-Reply-To: References: Message-ID: > Please see [JDK-8290840](https://bugs.openjdk.org/browse/JDK-8290840) for the detailed proposal. > > The `os` class, declared in os.hpp, forms the major part of the HotSpot porting interface. Its structure has gradually deteriorated over the years as new ports are created and new APIs are added. > > This RFE tries to address the following: > > - Clearly specify where a porting API should be declared and defined among the various `os*.cpp` and `os*.hpp` files. > - Avoid the inappropriate inclusion of OS-specific APIs (such as the `os::Linux class`) by platform-independent source files. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9600/files - new: https://git.openjdk.org/jdk/pull/9600/files/9f853295..560e67c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9600&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9600&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9600.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9600/head:pull/9600 PR: https://git.openjdk.org/jdk/pull/9600 From iklam at openjdk.org Fri Jul 29 20:31:45 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 29 Jul 2022 20:31:45 GMT Subject: RFR: 8290840: Refactor the "os" class [v2] In-Reply-To: <29bp6K9fT5FB6NuMaSRg9bKIp66HaXG_iza8hPSTENE=.c9487746-812c-45c3-aa01-0baa5ebb4483@github.com> References: <8vUXDokoOuVschkyCtu5BNpv8ZYIv01RllzKWhdZdXQ=.aa253ce1-7e3b-4efc-9b90-930a2a9b3ea5@github.com> <29bp6K9fT5FB6NuMaSRg9bKIp66HaXG_iza8hPSTENE=.c9487746-812c-45c3-aa01-0baa5ebb4483@github.com> Message-ID: On Fri, 29 Jul 2022 18:51:29 GMT, Gerard Ziemski wrote: >> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: >> >> - moved os::{print_active_locale, print_user_info} to os_posix.cpp >> - Fixed os.hpp comments per @dholmes-ora review > > src/hotspot/share/runtime/os.inline.hpp line 60: > >> 58: inline void* os::resolve_function_descriptor(void* p) { >> 59: return NULL; >> 60: } > > This is unnecessary and could be removed if we did: > > > +#ifdef HAVE_FUNCTION_DESCRIPTORS > // Used only on PPC. > inline static void* resolve_function_descriptor(void* p); > +#endif > > > in `os.hpp` After this PR, os.hpp no longer includes any os- or cpu-specific os_xxx.hpp files. I did this to avoid including large amount of os-specific declarations (such as os::Linux, which is about 400 lines) into generic source files, which aren't interested in these declarations. As a result of this change, there's no way to let os.hpp know whether `HAVE_FUNCTION_DESCRIPTORS` is defined for the current OS+CPU. If we want to do that, we need to either: - include os_xxx.hpp back in os.hpp (which I want to avoid) - add new header files such as `os_.defs.hpp` `os__.defs.hpp` to define `HAVE_FUNCTION_DESCRIPTORS` (this will result in lots of header files that usually have nothing inside) Since there are only 4 functions that fall under this pattern (functions that are used in generic code but most platforms have dummy implementations), I think what I have is a good compromise ------------- PR: https://git.openjdk.org/jdk/pull/9600 From rriggs at openjdk.org Fri Jul 29 21:25:47 2022 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 29 Jul 2022 21:25:47 GMT Subject: RFR: 8291360: Create entry points to expose low-level class file information In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 18:02:46 GMT, Harold Seigel wrote: > Please review this change to fix JDK-8291360. This fix adds entry points getClassFileVersion() and getClassAccessFlagsRaw() to class java.lang.Class. The new entry points return the current class's class file version and its raw access flags. > > The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 1-3 on Linux x64. Additionally, the JCK lang, vm, and api tests and new regression tests were run locally on Linux x64. > > Thanks, Harold src/hotspot/share/prims/jvm.cpp line 4059: > 4057: return JVM_CLASSFILE_MAJOR_VERSION; > 4058: } > 4059: assert(!java_lang_Class::as_Klass(mirror)->is_array_klass(), "unexpected array class"); Can this throw IllegalArgumentException instead. Asserts only report problems when built with debug (Right?) test/hotspot/jtreg/runtime/ClassFile/ClassAccessFlagsRawTest.java line 59: > 57: // test primitive array. should return ACC_ABSTRACT | ACC_FINAL | ACC_PUBLIC. > 58: int flags = (int)m.invoke((new int[3]).getClass()); > 59: if (flags != 1041) { Can this be a hex constant, It's easier to understand the bits. Or assemble the flags from java.lang.reflect.Modifier.XXX static fields. test/hotspot/jtreg/runtime/ClassFile/classAccessFlagsRaw.jcod line 25: > 23: */ > 24: > 25: // Class with ACC_SUPER set Can these classes be defined more succinctly either in Java or .asm? ------------- PR: https://git.openjdk.org/jdk/pull/9688 From rriggs at openjdk.org Fri Jul 29 21:25:48 2022 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 29 Jul 2022 21:25:48 GMT Subject: RFR: 8291360: Create entry points to expose low-level class file information In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 21:05:19 GMT, Roger Riggs wrote: >> Please review this change to fix JDK-8291360. This fix adds entry points getClassFileVersion() and getClassAccessFlagsRaw() to class java.lang.Class. The new entry points return the current class's class file version and its raw access flags. >> >> The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 1-3 on Linux x64. Additionally, the JCK lang, vm, and api tests and new regression tests were run locally on Linux x64. >> >> Thanks, Harold > > src/hotspot/share/prims/jvm.cpp line 4059: > >> 4057: return JVM_CLASSFILE_MAJOR_VERSION; >> 4058: } >> 4059: assert(!java_lang_Class::as_Klass(mirror)->is_array_klass(), "unexpected array class"); > > Can this throw IllegalArgumentException instead. > Asserts only report problems when built with debug (Right?) Or, Can the VM do this traversal as/more efficiently than doing at the java level? ------------- PR: https://git.openjdk.org/jdk/pull/9688 From kvn at openjdk.org Fri Jul 29 22:19:36 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 29 Jul 2022 22:19:36 GMT Subject: RFR: 8290062: Remove nmethodLocker for nmethods on-stack In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 07:49:09 GMT, Axel Boldt-Christmas wrote: > From JBS: > >> The nmethodLocker is pretty nasty as it prevents an nmethod from being freed, but without really keeping it alive. We would like to minimize its use. The most obvious places where it can be removed, is when "protecting" nmethods that are already on-stack. Neither the sweeper nor the GC is interested in making nmethods on-stack not live. These ones simply do not do anything. > > Removed the `nmethodLocker` where the nmethod is a caller on the stack. > > Testing: tier1-7 Okay. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9444 From eosterlund at openjdk.org Sat Jul 30 07:11:36 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Sat, 30 Jul 2022 07:11:36 GMT Subject: RFR: 8290062: Remove nmethodLocker for nmethods on-stack In-Reply-To: References: Message-ID: On Mon, 11 Jul 2022 07:49:09 GMT, Axel Boldt-Christmas wrote: > From JBS: > >> The nmethodLocker is pretty nasty as it prevents an nmethod from being freed, but without really keeping it alive. We would like to minimize its use. The most obvious places where it can be removed, is when "protecting" nmethods that are already on-stack. Neither the sweeper nor the GC is interested in making nmethods on-stack not live. These ones simply do not do anything. > > Removed the `nmethodLocker` where the nmethod is a caller on the stack. > > Testing: tier1-7 The nmethodLockers targeted in this PR are "protecting" nmethods that are already on the stack. Nmethods that are on the stack are kept alive by both sweeper and GC as they both walk stacks. Therefore, using nmethodLocker for on-stack nmethods is redundant. ------------- PR: https://git.openjdk.org/jdk/pull/9444 From dholmes at openjdk.org Sat Jul 30 07:50:19 2022 From: dholmes at openjdk.org (David Holmes) Date: Sat, 30 Jul 2022 07:50:19 GMT Subject: RFR: 8291360: Create entry points to expose low-level class file information In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 21:19:27 GMT, Roger Riggs wrote: >> src/hotspot/share/prims/jvm.cpp line 4059: >> >>> 4057: return JVM_CLASSFILE_MAJOR_VERSION; >>> 4058: } >>> 4059: assert(!java_lang_Class::as_Klass(mirror)->is_array_klass(), "unexpected array class"); >> >> Can this throw IllegalArgumentException instead. >> Asserts only report problems when built with debug (Right?) > > Or, Can the VM do this traversal as/more efficiently than doing at the java level? It is usual for the Java code to do such checks and throw exceptions, so that the VM assumes it is correct and only asserts incase something has been missed on the Java side. ------------- PR: https://git.openjdk.org/jdk/pull/9688 From dholmes at openjdk.org Sat Jul 30 07:50:20 2022 From: dholmes at openjdk.org (David Holmes) Date: Sat, 30 Jul 2022 07:50:20 GMT Subject: RFR: 8291360: Create entry points to expose low-level class file information In-Reply-To: References: Message-ID: On Sat, 30 Jul 2022 07:44:11 GMT, David Holmes wrote: >> Or, Can the VM do this traversal as/more efficiently than doing at the java level? > > It is usual for the Java code to do such checks and throw exceptions, so that the VM assumes it is correct and only asserts incase something has been missed on the Java side. Though in this case the Java code has defined behaviour for array types so it is correct for the VM to assume this is not an array type and to assert if it is. ------------- PR: https://git.openjdk.org/jdk/pull/9688 From kvn at openjdk.org Sat Jul 30 20:02:34 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 30 Jul 2022 20:02:34 GMT Subject: RFR: 8290062: Remove nmethodLocker for nmethods on-stack In-Reply-To: References: Message-ID: <8rMbpgD9n_c_u3hblp6uF32iStDuxPfY6y_tFQCb2tQ=.76f3051c-61ab-4481-8f39-dbf5f75329c8@github.com> On Sat, 30 Jul 2022 07:07:53 GMT, Erik ?sterlund wrote: > The nmethodLockers targeted in this PR are "protecting" nmethods that are already on the stack. Nmethods that are on the stack are kept alive by both sweeper and GC as they both walk stacks. Therefore, using nmethodLocker for on-stack nmethods is redundant. Thank you for explaining. ------------- PR: https://git.openjdk.org/jdk/pull/9444 From duke at openjdk.org Sun Jul 31 15:30:23 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Sun, 31 Jul 2022 15:30:23 GMT Subject: RFR: 8283232: x86: Improve vector broadcast operations [v13] In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 13:48:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the generation of broadcasting a scalar in several ways: >> >> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines. >> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high. >> - Load vectors using the same kind (integral vs floating point) of instructions as that of the results to avoid potential data bypass delay >> >> With this patch, the result of the added benchmark, which performs some operations with a really high register pressure, on my machine with Intel i7-7700HQ (avx2) is as follow: >> >> Before After >> Benchmark Mode Cnt Score Error Score Error Units Gain >> SpiltReplicate.testDouble avgt 5 42.621 ? 0.598 38.771 ? 0.797 ns/op +9.03% >> SpiltReplicate.testFloat avgt 5 42.245 ? 1.464 38.603 ? 0.367 ns/op +8.62% >> SpiltReplicate.testInt avgt 5 20.581 ? 5.791 13.755 ? 0.375 ns/op +33.17% >> SpiltReplicate.testLong avgt 5 17.794 ? 4.781 13.663 ? 0.387 ns/op +23.22% >> >> As expected, the constant table sizes shrink significantly from 1024 bytes to 256 bytes for `long`/`double` and 128 bytes for `int`/`float` cases. >> >> This patch also removes some redundant code paths and renames some incorrectly named instructions. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add load_constant_vector Thanks for your reviews. Does this PR need another run through the tests? ------------- PR: https://git.openjdk.org/jdk/pull/7832 From dholmes at openjdk.org Sun Jul 31 22:25:37 2022 From: dholmes at openjdk.org (David Holmes) Date: Sun, 31 Jul 2022 22:25:37 GMT Subject: RFR: 8291360: Create entry points to expose low-level class file information In-Reply-To: References: Message-ID: On Fri, 29 Jul 2022 18:02:46 GMT, Harold Seigel wrote: > Please review this change to fix JDK-8291360. This fix adds entry points getClassFileVersion() and getClassAccessFlagsRaw() to class java.lang.Class. The new entry points return the current class's class file version and its raw access flags. > > The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 1-3 on Linux x64. Additionally, the JCK lang, vm, and api tests and new regression tests were run locally on Linux x64. > > Thanks, Harold Hi Harold, Generally seems fine. A few nits and comments. Thanks. src/hotspot/share/include/jvm.h line 1163: > 1161: > 1162: /* > 1163: * Value types support. Value types? This is supporting the core reflection work isn't it? src/hotspot/share/prims/jvm.cpp line 4050: > 4048: /* > 4049: * Return the current class's class file version. The low order 16 bits of the > 4050: * the returned jint contains the class's major version. The high order 16 bits typo "the the" across the lines typo s/contains/contain/ src/hotspot/share/prims/jvm.cpp line 4051: > 4049: * Return the current class's class file version. The low order 16 bits of the > 4050: * the returned jint contains the class's major version. The high order 16 bits > 4051: * contains the class's minor version. typo s/contains/contain/ src/hotspot/share/prims/jvm.cpp line 4064: > 4062: assert(c->is_instance_klass(), "must be"); > 4063: InstanceKlass* ik = InstanceKlass::cast(c); > 4064: return (ik->minor_version() << 16) | ik->major_version(); I'm curious why the format is minor:major rather than major:minor ? src/java.base/share/classes/java/lang/Class.java line 4698: > 4696: * > 4697: * If the class is an array type then the access flags of the component type is > 4698: * returned. If the class is a primitive then ACC_ABSTRACT | ACC_FINAL | ACC_PUBLIC. The `ACC_ABSTRACT` seems odd - is that way of indicating this "class" can't be instantiated? Is there some spec document that explains this choice? test/hotspot/jtreg/runtime/ClassFile/ClassAccessFlagsRawTest.java line 60: > 58: int flags = (int)m.invoke((new int[3]).getClass()); > 59: if (flags != 1041) { > 60: throw new RuntimeException("expected 1041, got " + flags + " for primitive array"); Hex output would be clearer here too. test/hotspot/jtreg/runtime/ClassFile/ClassAccessFlagsRawTest.java line 66: > 64: flags = (int)m.invoke((new SUPERnotset[2]).getClass()); > 65: if (flags != 1) { > 66: throw new RuntimeException("expected 1, got " + flags + " for object array"); Again hex output would be clearer test/hotspot/jtreg/runtime/ClassFile/ClassFileVersionTest.java line 31: > 29: * @modules java.base/java.lang:open > 30: * @compile classFileVersions.jcod > 31: * @run main/othervm --enable-preview ClassFileVersionTest What preview feature is being used here? test/hotspot/jtreg/runtime/ClassFile/ClassFileVersionTest.java line 45: > 43: if (ver != expectedResult) { > 44: throw new RuntimeException( > 45: "expected " + expectedResult + ", got " + ver + " for class " + className); It would be clearer to show the expected and actual in minor:major format. That way if the test fails we can easily see which bit is wrong. test/hotspot/jtreg/runtime/ClassFile/ClassFileVersionTest.java line 55: > 53: > 54: testIt("Version64", 64); > 55: testIt("Version59", 59); Any particular reason to choose 59? Shouldn't there also be tests for non-zero minor versions? test/hotspot/jtreg/runtime/ClassFile/ClassFileVersionTest.java line 62: > 60: int ver = (int)m.invoke((new int[3]).getClass()); > 61: if (ver != 64) { > 62: throw new RuntimeException("expected 64, got " + ver + " for primitive array"); Again minor:major format. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/9688