From alanb at openjdk.org Wed Mar 1 10:00:50 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 1 Mar 2023 10:00:50 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v2] In-Reply-To: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: > This PR covers a number of issues with j.l.management.ThreadMXBean, and the JDK-specific extension c.s.management.ThreadMXBean, when there are virtual threads in use. > > As background, ThreadMXBean was re-specified in Java 19 to support the monitoring and management of platform threads. It does not support virtual threads as their potential number, and the need to find a thread by id, does not make sense for this API. At the same time, JDK 19 introduced an alternative implementation of virtual threads for Zero and ports without continuations support in the VM. This alternative implementation of virtual threads means a JavaThread per virtual thread and so requires filtering to ensure that the API behaves as specified. For the initial implementation, the filtering was done in the ThreadMXBean implementation. That works for most functions but not for getThreadXXXTime(long[]) and getThreadAllocatedBytes(long[]) where the filtering needs to be pushed down to the management code. > > The changes in this PR move the filtering to the management functions (jmm_XXX) so they only return information about platform threads. There are some minor adjustments to the API docs (see linked CSR). Test coverage is expanded as we didn't include tests for c.s.management.ThreadMXBean with virtual threads in JDK 19. > > Testing tier1-3 (jdk_management test group is in test/jdk/:tier3), plus sanity checking that --with-jvm-variants=minimal builds as some of this code is not compiled in with minimal VM builds. Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Clarify Thread CPU time seciton of spec - Merge - Fix minimal build - Fix minimal build - Initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12762/files - new: https://git.openjdk.org/jdk/pull/12762/files/c58765b3..7da35145 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12762&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12762&range=00-01 Stats: 3492 lines in 187 files changed: 2307 ins; 454 del; 731 mod Patch: https://git.openjdk.org/jdk/pull/12762.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12762/head:pull/12762 PR: https://git.openjdk.org/jdk/pull/12762 From alanb at openjdk.org Wed Mar 1 11:43:07 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 1 Mar 2023 11:43:07 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v2] In-Reply-To: References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: On Tue, 28 Feb 2023 21:06:19 GMT, Mandy Chung wrote: >> Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Clarify Thread CPU time seciton of spec >> - Merge >> - Fix minimal build >> - Fix minimal build >> - Initial commit > > test/jdk/java/lang/management/ThreadMXBean/VirtualThreads.java line 258: > >> 256: long tid = Thread.currentThread().threadId(); >> 257: long cpuTime = bean.getThreadCpuTime(tid); >> 258: assertEquals(-1L, cpuTime); > > Am I correct that `getThreadCpuTime(tid)` returns -1 for the current thread is a virtual thread whereas `getCurrentThreadCpuTime` throws UOE in the current implementation? > > `getCurrentThreadCpuTime` is specified to be equivalent to calling `getThreadCpuTime(Thread.currentThread().threadId()`. We didn't get this quite right in JDK 19 but I think I've fixed all those issues now. So assuming the VM supports CPU time for all platform threads, it means: - isThreadCpuTimeEnabled and isCurrentThreadCpuTimeSupported will return true. - If getThreadCpuTime(long) is called with the thread ID of a virtual thread then -1 will be returned. - If getCurrentThreadCpuTime() is called from a virtual thread then -1 will be returned. I did another pass over the API docs, update the "Thread CPU time" section, so I hope it is clearer now. ------------- PR: https://git.openjdk.org/jdk/pull/12762 From alanb at openjdk.org Wed Mar 1 12:39:52 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 1 Mar 2023 12:39:52 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v3] In-Reply-To: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: > This PR covers a number of issues with j.l.management.ThreadMXBean, and the JDK-specific extension c.s.management.ThreadMXBean, when there are virtual threads in use. > > As background, ThreadMXBean was re-specified in Java 19 to support the monitoring and management of platform threads. It does not support virtual threads as their potential number, and the need to find a thread by id, does not make sense for this API. At the same time, JDK 19 introduced an alternative implementation of virtual threads for Zero and ports without continuations support in the VM. This alternative implementation of virtual threads means a JavaThread per virtual thread and so requires filtering to ensure that the API behaves as specified. For the initial implementation, the filtering was done in the ThreadMXBean implementation. That works for most functions but not for getThreadXXXTime(long[]) and getThreadAllocatedBytes(long[]) where the filtering needs to be pushed down to the management code. > > The changes in this PR move the filtering to the management functions (jmm_XXX) so they only return information about platform threads. There are some minor adjustments to the API docs (see linked CSR). Test coverage is expanded as we didn't include tests for c.s.management.ThreadMXBean with virtual threads in JDK 19. > > Testing tier1-3 (jdk_management test group is in test/jdk/:tier3), plus sanity checking that --with-jvm-variants=minimal builds as some of this code is not compiled in with minimal VM builds. Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: Update isXXXThreadCpuTimeSupported descriptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12762/files - new: https://git.openjdk.org/jdk/pull/12762/files/7da35145..956a1e0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12762&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12762&range=01-02 Stats: 11 lines in 1 file changed: 0 ins; 2 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/12762.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12762/head:pull/12762 PR: https://git.openjdk.org/jdk/pull/12762 From alanb at openjdk.org Wed Mar 1 12:39:55 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 1 Mar 2023 12:39:55 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v3] In-Reply-To: References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: On Tue, 28 Feb 2023 20:49:51 GMT, Mandy Chung wrote: >> Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: >> >> Update isXXXThreadCpuTimeSupported descriptions > > src/java.management/share/classes/java/lang/management/ThreadMXBean.java line 529: > >> 527: /** >> 528: * Tests if the Java virtual machine implementation supports CPU time >> 529: * measurement for any platform thread. > > This change can also apply in `@return` (line 538) Yes, this make sense although getting the word right for isCurrentThreadCpuTimeSupported is tricky because its about the caller of getCurrentThreadCpuTime/getCurrentThreadUserTime rather than the caller of isCurrentThreadCpuTimeSupported. Hopefully it is clearer now. ------------- PR: https://git.openjdk.org/jdk/pull/12762 From kevinw at openjdk.org Wed Mar 1 14:43:05 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 Mar 2023 14:43:05 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: References: Message-ID: <1TIiI4FrqOAa2bhZQvTZthWmWuwZ8slzGdj-QzeOdcY=.efa75e2e-3ee5-4c54-82de-7823a9080356@github.com> On Wed, 15 Feb 2023 00:13:22 GMT, Chris Plummer wrote: > While working on [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), I ended up doing a lot of cleanup of nsk/share/jdi/EventHandler.java, so much so that the changes distract from the actual bug fix, so I decided it would be best to first push them with a separate CR. Changes include: > > - The main change is merging waitForRequestedEvent() and waitForRequestedEventSet(). These methods are quite big and almost identical. I had to add some more code to them for [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), and decided it was best to merge them first rather than making this code cloning situation even worse. > - Remove EventFilter.filtered() call when dealing with an uncaught exception event. These events are never filtered. > - Add complain() method. We already have a display() method so you can just call display() instead of log.display(), and it also adds the "EventHandler> " prefix to the output, so I though it would be good to do the same for log.complain() uses, especially since some of the uses also were adding the same prefix. > - Added a new EventListener that simply logs the event. > > The remaining changes are just minor local edits whose purpose should be obvious. > > If you need a bit of background on this code, read on. EventHandler has run() method that continually queries the event queue for more JDI events. When they come in, the registered EventListeners all get various callbacks. First eventSetReceived() is called for each. Then eventReceived() is called for each listener for each Event in the EventSet. If any eventReceived() returns true, that means the event was handled an no more listeners should be called. Finally, eventSetComplete() is called for each listener. > > There are a number of default listeners registered that are always in place. See createDefaultListeners(). They aren't really of concern with this PR. The waitForRequestedEvent() and waitForRequestedEventSet() methods (now combined into waitForRequestedEventSetCommon()) also register listeners, with the main one being used to check for the specific event that has been requested. These listeners are pre-pended to the default list. Listeners are always called in reverse of the order added. > > After setting up the listeners, waitForRequestedEventSetCommon() will block until a matching event has arrived. This is detected by the setting of en.set (and en.event). The listener will set them when the event matches, and this is done async from the thread that is running the run() method (remember, the run() method calls listener.eventSetRecieved()). Marked as reviewed by kevinw (Committer). Looks good I think 8-) ------------- PR: https://git.openjdk.org/jdk/pull/12568 From lmesnik at openjdk.org Wed Mar 1 16:07:17 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 1 Mar 2023 16:07:17 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: References: Message-ID: On Wed, 15 Feb 2023 00:13:22 GMT, Chris Plummer wrote: > While working on [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), I ended up doing a lot of cleanup of nsk/share/jdi/EventHandler.java, so much so that the changes distract from the actual bug fix, so I decided it would be best to first push them with a separate CR. Changes include: > > - The main change is merging waitForRequestedEvent() and waitForRequestedEventSet(). These methods are quite big and almost identical. I had to add some more code to them for [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), and decided it was best to merge them first rather than making this code cloning situation even worse. > - Remove EventFilter.filtered() call when dealing with an uncaught exception event. These events are never filtered. > - Add complain() method. We already have a display() method so you can just call display() instead of log.display(), and it also adds the "EventHandler> " prefix to the output, so I though it would be good to do the same for log.complain() uses, especially since some of the uses also were adding the same prefix. > - Added a new EventListener that simply logs the event. > > The remaining changes are just minor local edits whose purpose should be obvious. > > If you need a bit of background on this code, read on. EventHandler has run() method that continually queries the event queue for more JDI events. When they come in, the registered EventListeners all get various callbacks. First eventSetReceived() is called for each. Then eventReceived() is called for each listener for each Event in the EventSet. If any eventReceived() returns true, that means the event was handled an no more listeners should be called. Finally, eventSetComplete() is called for each listener. > > There are a number of default listeners registered that are always in place. See createDefaultListeners(). They aren't really of concern with this PR. The waitForRequestedEvent() and waitForRequestedEventSet() methods (now combined into waitForRequestedEventSetCommon()) also register listeners, with the main one being used to check for the specific event that has been requested. These listeners are pre-pended to the default list. Listeners are always called in reverse of the order added. > > After setting up the listeners, waitForRequestedEventSetCommon() will block until a matching event has arrived. This is detected by the setting of en.set (and en.event). The listener will set them when the event matches, and this is done async from the thread that is running the run() method (remember, the run() method calls listener.eventSetRecieved()). Seems that the braces style is incorrect in some places. test/hotspot/jtreg/vmTestbase/nsk/share/jdi/EventHandler.java line 336: > 334: defaultExceptionRequest != null && > 335: defaultExceptionRequest.equals(event.request())) > 336: { wrong indentation style test/hotspot/jtreg/vmTestbase/nsk/share/jdi/EventHandler.java line 372: > 370: > 371: private class EventNotification { > 372: volatile Event event = null; assign to null is not needed. test/hotspot/jtreg/vmTestbase/nsk/share/jdi/EventHandler.java line 382: > 380: long timeout, > 381: boolean shouldRemoveListeners) > 382: { wrong brace style ------------- Changes requested by lmesnik (Reviewer). PR: https://git.openjdk.org/jdk/pull/12568 From cjplummer at openjdk.org Wed Mar 1 19:51:13 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 1 Mar 2023 19:51:13 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 15:59:57 GMT, Leonid Mesnik wrote: >> While working on [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), I ended up doing a lot of cleanup of nsk/share/jdi/EventHandler.java, so much so that the changes distract from the actual bug fix, so I decided it would be best to first push them with a separate CR. Changes include: >> >> - The main change is merging waitForRequestedEvent() and waitForRequestedEventSet(). These methods are quite big and almost identical. I had to add some more code to them for [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), and decided it was best to merge them first rather than making this code cloning situation even worse. >> - Remove EventFilter.filtered() call when dealing with an uncaught exception event. These events are never filtered. >> - Add complain() method. We already have a display() method so you can just call display() instead of log.display(), and it also adds the "EventHandler> " prefix to the output, so I though it would be good to do the same for log.complain() uses, especially since some of the uses also were adding the same prefix. >> - Added a new EventListener that simply logs the event. >> >> The remaining changes are just minor local edits whose purpose should be obvious. >> >> If you need a bit of background on this code, read on. EventHandler has run() method that continually queries the event queue for more JDI events. When they come in, the registered EventListeners all get various callbacks. First eventSetReceived() is called for each. Then eventReceived() is called for each listener for each Event in the EventSet. If any eventReceived() returns true, that means the event was handled an no more listeners should be called. Finally, eventSetComplete() is called for each listener. >> >> There are a number of default listeners registered that are always in place. See createDefaultListeners(). They aren't really of concern with this PR. The waitForRequestedEvent() and waitForRequestedEventSet() methods (now combined into waitForRequestedEventSetCommon()) also register listeners, with the main one being used to check for the specific event that has been requested. These listeners are pre-pended to the default list. Listeners are always called in reverse of the order added. >> >> After setting up the listeners, waitForRequestedEventSetCommon() will block until a matching event has arrived. This is detected by the setting of en.set (and en.event). The listener will set them when the event matches, and this is done async from the thread that is running the run() method (remember, the run() method calls listener.eventSetRecieved()). > > test/hotspot/jtreg/vmTestbase/nsk/share/jdi/EventHandler.java line 336: > >> 334: defaultExceptionRequest != null && >> 335: defaultExceptionRequest.equals(event.request())) >> 336: { > > wrong indentation style How would you recommend doing it? Is there a style guide that covers this (I can never find it when I need it). ------------- PR: https://git.openjdk.org/jdk/pull/12568 From lmesnik at openjdk.org Wed Mar 1 21:00:14 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 1 Mar 2023 21:00:14 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: References: Message-ID: <_pnY7BdDjc_m_wIVT0Y0c1-McWrcNVPu2nqLDdBXTiQ=.e4c2e141-6f12-4854-9d63-96377dde5b2f@github.com> On Wed, 1 Mar 2023 19:48:36 GMT, Chris Plummer wrote: >> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/EventHandler.java line 336: >> >>> 334: defaultExceptionRequest != null && >>> 335: defaultExceptionRequest.equals(event.request())) >>> 336: { >> >> wrong indentation style > > How would you recommend doing it? Is there a style guide that covers this (I can never find it when I need it). Just as it was before. if (event instanceof ExceptionEvent && defaultExceptionRequest != null && defaultExceptionRequest.equals(event.request())) { ------------- PR: https://git.openjdk.org/jdk/pull/12568 From cjplummer at openjdk.org Wed Mar 1 21:06:16 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 1 Mar 2023 21:06:16 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: <_pnY7BdDjc_m_wIVT0Y0c1-McWrcNVPu2nqLDdBXTiQ=.e4c2e141-6f12-4854-9d63-96377dde5b2f@github.com> References: <_pnY7BdDjc_m_wIVT0Y0c1-McWrcNVPu2nqLDdBXTiQ=.e4c2e141-6f12-4854-9d63-96377dde5b2f@github.com> Message-ID: On Wed, 1 Mar 2023 20:57:24 GMT, Leonid Mesnik wrote: >> How would you recommend doing it? Is there a style guide that covers this (I can never find it when I need it). > > Just as it was before. > if (event instanceof ExceptionEvent && > defaultExceptionRequest != null && > defaultExceptionRequest.equals(event.request())) { I find that much less readable since the 2nd and 3rd lines of the `if` expression are indented the same as the first statement that follows. Previously they had added a blank line to resolve this, but I don't like that either. You shouldn't have a blank line at the start of a compound statement block. ------------- PR: https://git.openjdk.org/jdk/pull/12568 From lmesnik at openjdk.org Wed Mar 1 21:15:13 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 1 Mar 2023 21:15:13 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: References: Message-ID: <2oAG8jGvOrcQKRTgZ53VO9Y9QbzbSuWilZrcfkpggfs=.9a06de84-6a39-4191-9db9-b050ee8baa30@github.com> On Wed, 15 Feb 2023 00:13:22 GMT, Chris Plummer wrote: > While working on [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), I ended up doing a lot of cleanup of nsk/share/jdi/EventHandler.java, so much so that the changes distract from the actual bug fix, so I decided it would be best to first push them with a separate CR. Changes include: > > - The main change is merging waitForRequestedEvent() and waitForRequestedEventSet(). These methods are quite big and almost identical. I had to add some more code to them for [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), and decided it was best to merge them first rather than making this code cloning situation even worse. > - Remove EventFilter.filtered() call when dealing with an uncaught exception event. These events are never filtered. > - Add complain() method. We already have a display() method so you can just call display() instead of log.display(), and it also adds the "EventHandler> " prefix to the output, so I though it would be good to do the same for log.complain() uses, especially since some of the uses also were adding the same prefix. > - Added a new EventListener that simply logs the event. > > The remaining changes are just minor local edits whose purpose should be obvious. > > If you need a bit of background on this code, read on. EventHandler has run() method that continually queries the event queue for more JDI events. When they come in, the registered EventListeners all get various callbacks. First eventSetReceived() is called for each. Then eventReceived() is called for each listener for each Event in the EventSet. If any eventReceived() returns true, that means the event was handled an no more listeners should be called. Finally, eventSetComplete() is called for each listener. > > There are a number of default listeners registered that are always in place. See createDefaultListeners(). They aren't really of concern with this PR. The waitForRequestedEvent() and waitForRequestedEventSet() methods (now combined into waitForRequestedEventSetCommon()) also register listeners, with the main one being used to check for the specific event that has been requested. These listeners are pre-pended to the default list. Listeners are always called in reverse of the order added. > > After setting up the listeners, waitForRequestedEventSetCommon() will block until a matching event has arrived. This is detected by the setting of en.set (and en.event). The listener will set them when the event matches, and this is done async from the thread that is running the run() method (remember, the run() method calls listener.eventSetRecieved()). Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12568 From lmesnik at openjdk.org Wed Mar 1 21:15:15 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 1 Mar 2023 21:15:15 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: References: <_pnY7BdDjc_m_wIVT0Y0c1-McWrcNVPu2nqLDdBXTiQ=.e4c2e141-6f12-4854-9d63-96377dde5b2f@github.com> Message-ID: <7Ha78uPkbtXxTmAXQx53rw_vL8BWqz3oAhsd1_PIMHI=.fb05745e-6942-4d31-ba15-d8a016b3f98d@github.com> On Wed, 1 Mar 2023 21:02:50 GMT, Chris Plummer wrote: >> Just as it was before. >> if (event instanceof ExceptionEvent && >> defaultExceptionRequest != null && >> defaultExceptionRequest.equals(event.request())) { > > I find that much less readable since the 2nd and 3rd lines of the `if` expression are indented the same as the first statement that follows. Previously they had added a blank line to resolve this, but I don't like that either. You shouldn't have a blank line at the start of a compound statement block. Well, I am not going to insist. ------------- PR: https://git.openjdk.org/jdk/pull/12568 From cjplummer at openjdk.org Wed Mar 1 21:38:20 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 1 Mar 2023 21:38:20 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: <7Ha78uPkbtXxTmAXQx53rw_vL8BWqz3oAhsd1_PIMHI=.fb05745e-6942-4d31-ba15-d8a016b3f98d@github.com> References: <_pnY7BdDjc_m_wIVT0Y0c1-McWrcNVPu2nqLDdBXTiQ=.e4c2e141-6f12-4854-9d63-96377dde5b2f@github.com> <7Ha78uPkbtXxTmAXQx53rw_vL8BWqz3oAhsd1_PIMHI=.fb05745e-6942-4d31-ba15-d8a016b3f98d@github.com> Message-ID: On Wed, 1 Mar 2023 21:12:20 GMT, Leonid Mesnik wrote: >> I find that much less readable since the 2nd and 3rd lines of the `if` expression are indented the same as the first statement that follows. Previously they had added a blank line to resolve this, but I don't like that either. You shouldn't have a blank line at the start of a compound statement block. > > Well, I am not going to insist. The following doc (and I have no idea how "official" it is since it is 25 years old): https://www.oracle.com/technetwork/java/codeconventions-150003.pdf Says to indent the 2nd and 3rd lines by 8 instead of 4. See page 6. Does that seem ok? ------------- PR: https://git.openjdk.org/jdk/pull/12568 From mchung at openjdk.org Wed Mar 1 21:41:05 2023 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 1 Mar 2023 21:41:05 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v3] In-Reply-To: References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: <8oxX7u8M2132OZEhwIskF2O3ACV2EF6mjnoJwvUF3wA=.5ea494c5-ee73-4495-b135-b690f2eae463@github.com> On Wed, 1 Mar 2023 12:39:52 GMT, Alan Bateman wrote: >> This PR covers a number of issues with j.l.management.ThreadMXBean, and the JDK-specific extension c.s.management.ThreadMXBean, when there are virtual threads in use. >> >> As background, ThreadMXBean was re-specified in Java 19 to support the monitoring and management of platform threads. It does not support virtual threads as their potential number, and the need to find a thread by id, does not make sense for this API. At the same time, JDK 19 introduced an alternative implementation of virtual threads for Zero and ports without continuations support in the VM. This alternative implementation of virtual threads means a JavaThread per virtual thread and so requires filtering to ensure that the API behaves as specified. For the initial implementation, the filtering was done in the ThreadMXBean implementation. That works for most functions but not for getThreadXXXTime(long[]) and getThreadAllocatedBytes(long[]) where the filtering needs to be pushed down to the management code. >> >> The changes in this PR move the filtering to the management functions (jmm_XXX) so they only return information about platform threads. It also fixes ThreadMXBean.getCurrentThreadCpuTime and getCurrentThreadUserTime to not throw UOE when CPU time measurement from a platform thread is supported. There are some small adjustments to the API docs (see linked CSR). Test coverage is expanded as we didn't include tests for c.s.management.ThreadMXBean with virtual threads in JDK 19. >> >> Testing tier1-3 (jdk_management test group is in test/jdk/:tier3), plus sanity checking that --with-jvm-variants=minimal builds as some of this code is not compiled in with minimal VM builds. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Update isXXXThreadCpuTimeSupported descriptions Marked as reviewed by mchung (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12762 From mchung at openjdk.org Wed Mar 1 21:41:07 2023 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 1 Mar 2023 21:41:07 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v3] In-Reply-To: References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: On Wed, 1 Mar 2023 11:40:26 GMT, Alan Bateman wrote: >> test/jdk/java/lang/management/ThreadMXBean/VirtualThreads.java line 258: >> >>> 256: long tid = Thread.currentThread().threadId(); >>> 257: long cpuTime = bean.getThreadCpuTime(tid); >>> 258: assertEquals(-1L, cpuTime); >> >> Am I correct that `getThreadCpuTime(tid)` returns -1 for the current thread is a virtual thread whereas `getCurrentThreadCpuTime` throws UOE in the current implementation? >> >> `getCurrentThreadCpuTime` is specified to be equivalent to calling `getThreadCpuTime(Thread.currentThread().threadId()`. > > We didn't get this quite right in JDK 19 but I think I've fixed all those issues now. So assuming the VM supports CPU time for all platform threads, it means: > > - isThreadCpuTimeEnabled and isCurrentThreadCpuTimeSupported will return true. > - If getThreadCpuTime(long) is called with the thread ID of a virtual thread then -1 will be returned. > - If getCurrentThreadCpuTime() is called from a virtual thread then -1 will be returned. > > I did another pass over the API docs, update the "Thread CPU time" section, so I hope it is clearer now. This looks better. I see `testGetCurrentThreadCpuTime` and `testGetCurrentThreadUserTime` test cases fixed. ------------- PR: https://git.openjdk.org/jdk/pull/12762 From mchung at openjdk.org Wed Mar 1 21:50:17 2023 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 1 Mar 2023 21:50:17 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v3] In-Reply-To: References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: On Wed, 1 Mar 2023 12:39:52 GMT, Alan Bateman wrote: >> This PR covers a number of issues with j.l.management.ThreadMXBean, and the JDK-specific extension c.s.management.ThreadMXBean, when there are virtual threads in use. >> >> As background, ThreadMXBean was re-specified in Java 19 to support the monitoring and management of platform threads. It does not support virtual threads as their potential number, and the need to find a thread by id, does not make sense for this API. At the same time, JDK 19 introduced an alternative implementation of virtual threads for Zero and ports without continuations support in the VM. This alternative implementation of virtual threads means a JavaThread per virtual thread and so requires filtering to ensure that the API behaves as specified. For the initial implementation, the filtering was done in the ThreadMXBean implementation. That works for most functions but not for getThreadXXXTime(long[]) and getThreadAllocatedBytes(long[]) where the filtering needs to be pushed down to the management code. >> >> The changes in this PR move the filtering to the management functions (jmm_XXX) so they only return information about platform threads. It also fixes ThreadMXBean.getCurrentThreadCpuTime and getCurrentThreadUserTime to not throw UOE when CPU time measurement from a platform thread is supported. There are some small adjustments to the API docs (see linked CSR). Test coverage is expanded as we didn't include tests for c.s.management.ThreadMXBean with virtual threads in JDK 19. >> >> Testing tier1-3 (jdk_management test group is in test/jdk/:tier3), plus sanity checking that --with-jvm-variants=minimal builds as some of this code is not compiled in with minimal VM builds. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Update isXXXThreadCpuTimeSupported descriptions src/java.management/share/classes/java/lang/management/ThreadMXBean.java line 479: > 477: * if the thread of the specified ID exists, the thread is alive, > 478: * and CPU time measurement is enabled; {@code -1} if not enabled > 479: * or the specified ID is a virtual thread It should be "{@code -1} if not enabled or the specified ID is a virtual thread or the thread does not exist or not alive." Would this be simpler: * @return the total CPU time for a thread of the specified ID * if the thread of the specified ID is a platform thread, the thread is alive, * and CPU time measurement is enabled; {@code -1} otherwise. I'm fine with either way. Same comment for `getThreadUserTime(long)` ------------- PR: https://git.openjdk.org/jdk/pull/12762 From lmesnik at openjdk.org Wed Mar 1 21:57:15 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 1 Mar 2023 21:57:15 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: References: <_pnY7BdDjc_m_wIVT0Y0c1-McWrcNVPu2nqLDdBXTiQ=.e4c2e141-6f12-4854-9d63-96377dde5b2f@github.com> <7Ha78uPkbtXxTmAXQx53rw_vL8BWqz3oAhsd1_PIMHI=.fb05745e-6942-4d31-ba15-d8a016b3f98d@github.com> Message-ID: <_uOPaudNltpW0mSYEaM-sUxKA6HFFaJBKn5fWCSKENQ=.6597f07c-536e-4571-b702-c24fde5d7ba7@github.com> On Wed, 1 Mar 2023 21:35:34 GMT, Chris Plummer wrote: >> Well, I am not going to insist. > > The following doc (and I have no idea how "official" it is since it is 25 years old): > > https://www.oracle.com/technetwork/java/codeconventions-150003.pdf > > Says to indent the 2nd and 3rd lines by 8 instead of 4. See page 6. Does that seem ok? Even if it is not recommended by codestyle, I think the generic rule (for all hotspot) is to keep indentation consistent. There are a lot of places where 4 spaces are used. So it is fine to left 4 spaces, but you could update it to 8 if it makes code readable. ------------- PR: https://git.openjdk.org/jdk/pull/12568 From cjplummer at openjdk.org Thu Mar 2 02:42:49 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 02:42:49 GMT Subject: RFR: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java [v2] In-Reply-To: References: Message-ID: > While working on [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), I ended up doing a lot of cleanup of nsk/share/jdi/EventHandler.java, so much so that the changes distract from the actual bug fix, so I decided it would be best to first push them with a separate CR. Changes include: > > - The main change is merging waitForRequestedEvent() and waitForRequestedEventSet(). These methods are quite big and almost identical. I had to add some more code to them for [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), and decided it was best to merge them first rather than making this code cloning situation even worse. > - Remove EventFilter.filtered() call when dealing with an uncaught exception event. These events are never filtered. > - Add complain() method. We already have a display() method so you can just call display() instead of log.display(), and it also adds the "EventHandler> " prefix to the output, so I though it would be good to do the same for log.complain() uses, especially since some of the uses also were adding the same prefix. > - Added a new EventListener that simply logs the event. > > The remaining changes are just minor local edits whose purpose should be obvious. > > If you need a bit of background on this code, read on. EventHandler has run() method that continually queries the event queue for more JDI events. When they come in, the registered EventListeners all get various callbacks. First eventSetReceived() is called for each. Then eventReceived() is called for each listener for each Event in the EventSet. If any eventReceived() returns true, that means the event was handled an no more listeners should be called. Finally, eventSetComplete() is called for each listener. > > There are a number of default listeners registered that are always in place. See createDefaultListeners(). They aren't really of concern with this PR. The waitForRequestedEvent() and waitForRequestedEventSet() methods (now combined into waitForRequestedEventSetCommon()) also register listeners, with the main one being used to check for the specific event that has been requested. These listeners are pre-pended to the default list. Listeners are always called in reverse of the order added. > > After setting up the listeners, waitForRequestedEventSetCommon() will block until a matching event has arrived. This is detected by the setting of en.set (and en.event). The listener will set them when the event matches, and this is done async from the thread that is running the run() method (remember, the run() method calls listener.eventSetRecieved()). Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: Some minor formatting edits. Get rid of unnecessary null initializations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12568/files - new: https://git.openjdk.org/jdk/pull/12568/files/ae5a9db4..b4a671bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12568&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12568&range=00-01 Stats: 9 lines in 1 file changed: 0 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12568.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12568/head:pull/12568 PR: https://git.openjdk.org/jdk/pull/12568 From amenkov at openjdk.org Thu Mar 2 02:49:09 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 2 Mar 2023 02:49:09 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unuque vtables Message-ID: Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. The fix adds test to verify this requirement. The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) ------------- Commit messages: - UniqueVtableTest Changes: https://git.openjdk.org/jdk/pull/12820/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12820&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303489 Stats: 177 lines in 1 file changed: 177 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12820.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12820/head:pull/12820 PR: https://git.openjdk.org/jdk/pull/12820 From duke at openjdk.org Thu Mar 2 03:35:15 2023 From: duke at openjdk.org (duke) Date: Thu, 2 Mar 2023 03:35:15 GMT Subject: Withdrawn: JDK-8295756 Improve NonLocalRegistry Manual Test Process In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 21:45:30 GMT, Bill Huang wrote: > The current non local registry tests require a manual process that runs rmiregitrty on a different machine and changes the -Dregistry.host property in the source before running the tests on the local machine. This task is created to improve this manual process and provide a clearer instruction to the test engineer about the test requirement. > > Tests include: > java/rmi/registry/nonLocalRegistry/NonLocalSkeletonTest.java > java/rmi/registry/nonLocalRegistry/NonLocalRegistryTest.java > javax/management/remote/nonLocalAccess/NonLocalJMXRemoteTest.java This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/10825 From cjplummer at openjdk.org Thu Mar 2 03:42:14 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 03:42:14 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unuque vtables In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 02:41:12 GMT, Alex Menkov wrote: > Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. > The fix adds test to verify this requirement. > > The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 84: > 82: MethodHandles.Lookup classLookup = MethodHandles.privateLookupIn(BasicTypeDataBase.class, lookup); > 83: vtblForType = classLookup.findVirtual(BasicTypeDataBase.class, "vtblForType", > 84: MethodType.methodType(Address.class, Type.class)); I think it would be ok to just make vtblForType() public so you won't need to use reflection. The public SA APIs are not a spec. We are free to do things in the future that are not backwards compatible. ------------- PR: https://git.openjdk.org/jdk/pull/12820 From alanb at openjdk.org Thu Mar 2 08:18:03 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 2 Mar 2023 08:18:03 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v4] In-Reply-To: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: > This PR covers a number of issues with j.l.management.ThreadMXBean, and the JDK-specific extension c.s.management.ThreadMXBean, when there are virtual threads in use. > > As background, ThreadMXBean was re-specified in Java 19 to support the monitoring and management of platform threads. It does not support virtual threads as their potential number, and the need to find a thread by id, does not make sense for this API. At the same time, JDK 19 introduced an alternative implementation of virtual threads for Zero and ports without continuations support in the VM. This alternative implementation of virtual threads means a JavaThread per virtual thread and so requires filtering to ensure that the API behaves as specified. For the initial implementation, the filtering was done in the ThreadMXBean implementation. That works for most functions but not for getThreadXXXTime(long[]) and getThreadAllocatedBytes(long[]) where the filtering needs to be pushed down to the management code. > > The changes in this PR move the filtering to the management functions (jmm_XXX) so they only return information about platform threads. It also fixes ThreadMXBean.getCurrentThreadCpuTime and getCurrentThreadUserTime to not throw UOE when CPU time measurement from a platform thread is supported. There are some small adjustments to the API docs (see linked CSR). Test coverage is expanded as we didn't include tests for c.s.management.ThreadMXBean with virtual threads in JDK 19. > > Testing tier1-3 (jdk_management test group is in test/jdk/:tier3), plus sanity checking that --with-jvm-variants=minimal builds as some of this code is not compiled in with minimal VM builds. Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Tweak javadoc to avoid listing too many conditions in @return description - Merge - Update isXXXThreadCpuTimeSupported descriptions - Clarify Thread CPU time seciton of spec - Merge - Fix minimal build - Fix minimal build - Initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12762/files - new: https://git.openjdk.org/jdk/pull/12762/files/956a1e0d..dd212cdb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12762&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12762&range=02-03 Stats: 1256 lines in 50 files changed: 999 ins; 156 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/12762.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12762/head:pull/12762 PR: https://git.openjdk.org/jdk/pull/12762 From alanb at openjdk.org Thu Mar 2 08:18:05 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 2 Mar 2023 08:18:05 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v3] In-Reply-To: References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: On Wed, 1 Mar 2023 21:46:54 GMT, Mandy Chung wrote: >> Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: >> >> Update isXXXThreadCpuTimeSupported descriptions > > src/java.management/share/classes/java/lang/management/ThreadMXBean.java line 479: > >> 477: * if the thread of the specified ID exists, the thread is alive, >> 478: * and CPU time measurement is enabled; {@code -1} if not enabled >> 479: * or the specified ID is a virtual thread > > It should be "{@code -1} if not enabled or the specified ID is a virtual thread or the thread does not exist or not alive." > > Would this be simpler: > > > * @return the total CPU time for a thread of the specified ID > * if the thread of the specified ID is a platform thread, the thread is alive, > * and CPU time measurement is enabled; {@code -1} otherwise. > > > I'm fine with either way. Same comment for `getThreadUserTime(long)` That is a bit better as it avoids needing to list conditions for the "otherwise" case. I've update these methods to use that style and also updated the CSR. ------------- PR: https://git.openjdk.org/jdk/pull/12762 From sspitsyn at openjdk.org Thu Mar 2 08:53:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 Mar 2023 08:53:11 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: References: Message-ID: On Wed, 23 Nov 2022 10:14:23 GMT, Serguei Spitsyn wrote: >> This problem has two sides. >> One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. >> It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` >> value has been set to `true` when an agent library is loaded into running VM. >> The fix is to get rid of this cashing. >> Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. >> Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. >> The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. >> >> Testing: >> The originally failed tests are passed now: >> >> runtime/vthread/RedefineClass.java >> runtime/vthread/TestObjectAllocationSampleEvent.java >> >> In progress: >> Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > remove caching if notifyJvmtiEvents in yieldContinuation Need to keep this PR alive for a while. ------------- PR: https://git.openjdk.org/jdk/pull/11304 From kevinw at openjdk.org Thu Mar 2 10:16:02 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 2 Mar 2023 10:16:02 GMT Subject: RFR: 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed with "isCollectionUsageThresholdExceeded() returned true, while threshold = 1 and used = 0" Message-ID: Test update for an occasional failure, which does not reproduce. The test failure in JDK-8303136 is at line 141 in the updated file here. It's the failure where isExceeded is true, but our sampled "used" value is not above the threshold. But while the comment says it's refreshing values, it does not not refresh "used", so there could have been gc/promotion activity which hits the threshold outside of the test's control. Refreshing "used" is the addition here. Separately, the code at line 123 in the new file also claims to refresh the values, but it only refreshes the threshold, which we aren't changing. Not making it refresh "used" at that point looks correct, so remove the "if (used >= threshold)" as we have already checked that at line 116. ------------- Commit messages: - 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed Changes: https://git.openjdk.org/jdk/pull/12823/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12823&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303136 Stats: 12 lines in 1 file changed: 1 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12823.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12823/head:pull/12823 PR: https://git.openjdk.org/jdk/pull/12823 From prappo at openjdk.org Thu Mar 2 12:13:14 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Thu, 2 Mar 2023 12:13:14 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments Message-ID: Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 @@ -17084,7 +17084,7 @@ throws IOException, ClassNotFoundException
readObject is called to restore the state of the - (@code BasicPermission} from a stream.
+ BasicPermission from a stream.
Parameters:
s - the ObjectInputStream from which data is read
Notes ----- * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. * I will update copyright years after (and if) the fix had been approved, as required. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/12826/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12826&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303480 Stats: 75 lines in 39 files changed: 0 ins; 0 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/12826.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12826/head:pull/12826 PR: https://git.openjdk.org/jdk/pull/12826 From mullan at openjdk.org Thu Mar 2 13:24:07 2023 From: mullan at openjdk.org (Sean Mullan) Date: Thu, 2 Mar 2023 13:24:07 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. security related changes look fine. ------------- Marked as reviewed by mullan (Reviewer). PR: https://git.openjdk.org/jdk/pull/12826 From cjplummer at openjdk.org Thu Mar 2 16:06:34 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 16:06:34 GMT Subject: Integrated: 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java In-Reply-To: References: Message-ID: On Wed, 15 Feb 2023 00:13:22 GMT, Chris Plummer wrote: > While working on [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), I ended up doing a lot of cleanup of nsk/share/jdi/EventHandler.java, so much so that the changes distract from the actual bug fix, so I decided it would be best to first push them with a separate CR. Changes include: > > - The main change is merging waitForRequestedEvent() and waitForRequestedEventSet(). These methods are quite big and almost identical. I had to add some more code to them for [JDK-8289765](https://bugs.openjdk.org/browse/JDK-8289765), and decided it was best to merge them first rather than making this code cloning situation even worse. > - Remove EventFilter.filtered() call when dealing with an uncaught exception event. These events are never filtered. > - Add complain() method. We already have a display() method so you can just call display() instead of log.display(), and it also adds the "EventHandler> " prefix to the output, so I though it would be good to do the same for log.complain() uses, especially since some of the uses also were adding the same prefix. > - Added a new EventListener that simply logs the event. > > The remaining changes are just minor local edits whose purpose should be obvious. > > If you need a bit of background on this code, read on. EventHandler has run() method that continually queries the event queue for more JDI events. When they come in, the registered EventListeners all get various callbacks. First eventSetReceived() is called for each. Then eventReceived() is called for each listener for each Event in the EventSet. If any eventReceived() returns true, that means the event was handled an no more listeners should be called. Finally, eventSetComplete() is called for each listener. > > There are a number of default listeners registered that are always in place. See createDefaultListeners(). They aren't really of concern with this PR. The waitForRequestedEvent() and waitForRequestedEventSet() methods (now combined into waitForRequestedEventSetCommon()) also register listeners, with the main one being used to check for the specific event that has been requested. These listeners are pre-pended to the default list. Listeners are always called in reverse of the order added. > > After setting up the listeners, waitForRequestedEventSetCommon() will block until a matching event has arrived. This is detected by the setting of en.set (and en.event). The listener will set them when the event matches, and this is done async from the thread that is running the run() method (remember, the run() method calls listener.eventSetRecieved()). This pull request has now been integrated. Changeset: 0926d0cb Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/0926d0cbceb52f7b12cd69970ed0944d4ed2a242 Stats: 143 lines in 1 file changed: 46 ins; 70 del; 27 mod 8302516: Do some cleanup of nsk/share/jdi/EventHandler.java Reviewed-by: amenkov, kevinw, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/12568 From cjplummer at openjdk.org Thu Mar 2 16:15:17 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 16:15:17 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unuque vtables In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 02:41:12 GMT, Alex Menkov wrote: > Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. > The fix adds test to verify this requirement. > > The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) Changes requested by cjplummer (Reviewer). test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 94: > 92: > 93: private void runTest() throws Throwable { > 94: Map> types = new HashMap<>(); I think a better name than "types" is needed. Something like "vtableAddressToTypeMap". test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 96: > 94: Map> types = new HashMap<>(); > 95: Iterator it = agent.getTypeDataBase().getTypes(); > 96: int dupFound = 0; Should be "dupsFound" test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 112: > 110: } > 111: > 112: if (vtable == null && t.getSuperclass() != null) { Why only log if there is no superclass? test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 118: > 116: + ", JPrimitive: " + t.isJavaPrimitiveType() > 117: + ", Oop: " + t.isOopType() > 118: + ", Ptr: " + t.isPointerType()); It appears that these always print "false". Are they worth having? ------------- PR: https://git.openjdk.org/jdk/pull/12820 From matsaave at openjdk.org Thu Mar 2 16:55:23 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 2 Mar 2023 16:55:23 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry Message-ID: The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. This change supports the following platforms: x86, aarch64, PPC, and RISCV ------------- Commit messages: - PPC and RISCV port - 8301995: Move invokedynamic resolution information out of the cpCache Changes: https://git.openjdk.org/jdk/pull/12778/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301995 Stats: 1418 lines in 54 files changed: 1036 ins; 168 del; 214 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From cjplummer at openjdk.org Thu Mar 2 17:49:58 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 17:49:58 GMT Subject: RFR: 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java Message-ID: attach002a is problem listed under [JDK-8277812](https://bugs.openjdk.org/browse/JDK-8277812), which has been closed as a dup of [JDK-8277573](https://bugs.openjdk.org/browse/JDK-8277573), so its problem list entry should be updated to reflect this. The other issue is that it is currently in the general problem list, but only occurs with -Xcomp, so it needs to be moved to ProblemList-Xcomp.txt. ------------- Commit messages: - Fix problem list entry for attach002a Changes: https://git.openjdk.org/jdk/pull/12836/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12836&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303523 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12836.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12836/head:pull/12836 PR: https://git.openjdk.org/jdk/pull/12836 From mchung at openjdk.org Thu Mar 2 17:53:07 2023 From: mchung at openjdk.org (Mandy Chung) Date: Thu, 2 Mar 2023 17:53:07 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v4] In-Reply-To: References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: On Thu, 2 Mar 2023 08:18:03 GMT, Alan Bateman wrote: >> This PR covers a number of issues with j.l.management.ThreadMXBean, and the JDK-specific extension c.s.management.ThreadMXBean, when there are virtual threads in use. >> >> As background, ThreadMXBean was re-specified in Java 19 to support the monitoring and management of platform threads. It does not support virtual threads as their potential number, and the need to find a thread by id, does not make sense for this API. At the same time, JDK 19 introduced an alternative implementation of virtual threads for Zero and ports without continuations support in the VM. This alternative implementation of virtual threads means a JavaThread per virtual thread and so requires filtering to ensure that the API behaves as specified. For the initial implementation, the filtering was done in the ThreadMXBean implementation. That works for most functions but not for getThreadXXXTime(long[]) and getThreadAllocatedBytes(long[]) where the filtering needs to be pushed down to the management code. >> >> The changes in this PR move the filtering to the management functions (jmm_XXX) so they only return information about platform threads. It also fixes ThreadMXBean.getCurrentThreadCpuTime and getCurrentThreadUserTime to not throw UOE when CPU time measurement from a platform thread is supported. There are some small adjustments to the API docs (see linked CSR). Test coverage is expanded as we didn't include tests for c.s.management.ThreadMXBean with virtual threads in JDK 19. >> >> Testing tier1-3 (jdk_management test group is in test/jdk/:tier3), plus sanity checking that --with-jvm-variants=minimal builds as some of this code is not compiled in with minimal VM builds. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Tweak javadoc to avoid listing too many conditions in @return description > - Merge > - Update isXXXThreadCpuTimeSupported descriptions > - Clarify Thread CPU time seciton of spec > - Merge > - Fix minimal build > - Fix minimal build > - Initial commit Marked as reviewed by mchung (Reviewer). Looks good. ------------- PR: https://git.openjdk.org/jdk/pull/12762 From cjplummer at openjdk.org Thu Mar 2 18:03:09 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 18:03:09 GMT Subject: RFR: 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed with "isCollectionUsageThresholdExceeded() returned true, while threshold = 1 and used = 0" In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 09:20:23 GMT, Kevin Walls wrote: > Test update for an occasional failure, which does not reproduce. > > The test failure in JDK-8303136 is at line 141 in the updated file here. It's the failure where isExceeded is true, but our sampled "used" value is not above the threshold. But while the comment says it's refreshing values, it does not not refresh "used", so there could have been gc/promotion activity which hits the threshold outside of the test's control. Refreshing "used" is the addition here. > > Separately, the code at line 123 in the new file also claims to refresh the values, but it only refreshes the threshold, which we aren't changing. Not making it refresh "used" at that point looks correct, so remove the "if (used >= threshold)" as we have already checked that at line 116. test/hotspot/jtreg/vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded001.java line 126: > 124: threshold = monitor.getCollectionThreshold(pool); > 125: usage = monitor.getCollectionUsage(pool); > 126: if (used >= threshold) { Although `used` is not updated, `threshold` is. Couldn't removing this extra check result in new failures? ------------- PR: https://git.openjdk.org/jdk/pull/12823 From aturbanov at openjdk.org Thu Mar 2 18:43:19 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 2 Mar 2023 18:43:19 GMT Subject: Integrated: 8303267: Prefer ArrayList to LinkedList in ConcurrentLocksPrinter In-Reply-To: References: Message-ID: On Mon, 27 Feb 2023 12:54:24 GMT, Andrey Turbanov wrote: > LinkedList is used as value in `sun.jvm.hotspot.runtime.ConcurrentLocksPrinter#locksMap` Map. > There is only add/iterator calls on this lists. No removes from the head or something like this. Not sure why LinkedList was used, but ArrayList should be preferred as more efficient and widely used collection. > > Also I've done some related code cleaned: > 1. Mark field `locksMap` as final > 2. Use Map.computeIfAbsent > 3. Use enhanced-for cycle instead of `for` with iterator This pull request has now been integrated. Changeset: d4dcba04 Author: Andrey Turbanov URL: https://git.openjdk.org/jdk/commit/d4dcba04632f07555e4fe5547ee39125935a03c6 Stats: 10 lines in 1 file changed: 0 ins; 5 del; 5 mod 8303267: Prefer ArrayList to LinkedList in ConcurrentLocksPrinter Reviewed-by: cjplummer, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12763 From xuelei at openjdk.org Thu Mar 2 19:27:11 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Thu, 2 Mar 2023 19:27:11 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent Message-ID: Hi, May I have this update reviewed? The sprintf is deprecated in Xcode 14 because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.hotspot.agent module. Thanks, Xuelei ------------- Commit messages: - 8303527: update for deprecated sprintf for jdk.hotspot.agent Changes: https://git.openjdk.org/jdk/pull/12837/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12837&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303527 Stats: 5 lines in 2 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12837.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12837/head:pull/12837 PR: https://git.openjdk.org/jdk/pull/12837 From cjplummer at openjdk.org Thu Mar 2 19:34:47 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 19:34:47 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 19:17:40 GMT, Xue-Lei Andrew Fan wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.hotspot.agent module. > > Thanks, > Xuelei src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp line 188: > 186: const HRESULT hr = (v); \ > 187: if (hr != S_OK) { \ > 188: size_t errmsg_size = new char[strlen(str) + 32; This looks broken. I doubt it even compiles. Also, this is win32 so shouldn't be needed for xcode, although it doesn't hurt to fix. ------------- PR: https://git.openjdk.org/jdk/pull/12837 From amenkov at openjdk.org Thu Mar 2 20:42:12 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 2 Mar 2023 20:42:12 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unuque vtables [v2] In-Reply-To: References: Message-ID: > Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. > The fix adds test to verify this requirement. > > The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: addressed feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12820/files - new: https://git.openjdk.org/jdk/pull/12820/files/4521c29d..ceac098b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12820&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12820&range=00-01 Stats: 42 lines in 2 files changed: 9 ins; 14 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/12820.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12820/head:pull/12820 PR: https://git.openjdk.org/jdk/pull/12820 From amenkov at openjdk.org Thu Mar 2 20:42:16 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 2 Mar 2023 20:42:16 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unuque vtables [v2] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 03:39:27 GMT, Chris Plummer wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed feedback > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 84: > >> 82: MethodHandles.Lookup classLookup = MethodHandles.privateLookupIn(BasicTypeDataBase.class, lookup); >> 83: vtblForType = classLookup.findVirtual(BasicTypeDataBase.class, "vtblForType", >> 84: MethodType.methodType(Address.class, Type.class)); > > I think it would be ok to just make vtblForType() public so you won't need to use reflection. The public SA APIs are not a spec. We are free to do things in the future that are not backwards compatible. Agree. Done. Also made one field of the BasicTypeDataBase private (as it should be) > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 94: > >> 92: >> 93: private void runTest() throws Throwable { >> 94: Map> types = new HashMap<>(); > > I think a better name than "types" is needed. Something like "vtableAddressToTypeMap". Fixed (but used shorter name) ------------- PR: https://git.openjdk.org/jdk/pull/12820 From amenkov at openjdk.org Thu Mar 2 20:45:16 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 2 Mar 2023 20:45:16 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unuque vtables [v2] In-Reply-To: References: Message-ID: <5haZbJW44YKO7x2A-gC4CTl8Tc5LZg_hdofhHfiaNuE=.d88722a8-91a4-48a0-a350-feac97624ed8@github.com> On Thu, 2 Mar 2023 16:06:41 GMT, Chris Plummer wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed feedback > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 112: > >> 110: } >> 111: >> 112: if (vtable == null && t.getSuperclass() != null) { > > Why only log if there is no superclass? vmStruct contains a lot of entries which should have super == null I added logging of some stats to the test, on win-x64-fastdebug it reports: total: 861, no vtable: 503, no_vtable_with_super: 24 if test would log everything with no vtable, test log is truncated ------------- PR: https://git.openjdk.org/jdk/pull/12820 From amenkov at openjdk.org Thu Mar 2 20:53:15 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 2 Mar 2023 20:53:15 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unuque vtables [v2] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 16:09:51 GMT, Chris Plummer wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed feedback > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 96: > >> 94: Map> types = new HashMap<>(); >> 95: Iterator it = agent.getTypeDataBase().getTypes(); >> 96: int dupFound = 0; > > Should be "dupsFound" Fixed. > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 118: > >> 116: + ", JPrimitive: " + t.isJavaPrimitiveType() >> 117: + ", Oop: " + t.isOopType() >> 118: + ", Ptr: " + t.isPointerType()); > > It appears that these always print "false". Are they worth having? It reports all available info about the Type. As these are "suspicious" types I think it make sense to provide more info ------------- PR: https://git.openjdk.org/jdk/pull/12820 From prr at openjdk.org Thu Mar 2 21:29:04 2023 From: prr at openjdk.org (Phil Race) Date: Thu, 2 Mar 2023 21:29:04 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. java.desktop changes are fine ------------- Marked as reviewed by prr (Reviewer). PR: https://git.openjdk.org/jdk/pull/12826 From kevinw at openjdk.org Thu Mar 2 21:43:17 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 2 Mar 2023 21:43:17 GMT Subject: RFR: 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed with "isCollectionUsageThresholdExceeded() returned true, while threshold = 1 and used = 0" In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 18:00:28 GMT, Chris Plummer wrote: >> Test update for an occasional failure, which does not reproduce. >> >> The test failure in JDK-8303136 is at line 141 in the updated file here. It's the failure where isExceeded is true, but our sampled "used" value is not above the threshold. But while the comment says it's refreshing values, it does not not refresh "used", so there could have been gc/promotion activity which hits the threshold outside of the test's control. Refreshing "used" is the addition here. >> >> Separately, the code at line 123 in the new file also claims to refresh the values, but it only refreshes the threshold, which we aren't changing. Not making it refresh "used" at that point looks correct, so remove the "if (used >= threshold)" as we have already checked that at line 116. > > test/hotspot/jtreg/vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded001.java line 126: > >> 124: threshold = monitor.getCollectionThreshold(pool); >> 125: usage = monitor.getCollectionUsage(pool); >> 126: if (used >= threshold) { > > Although `used` is not updated, `threshold` is. Couldn't removing this extra check result in new failures? I think that's OK. We set the threshold earlier in the test, to the value of used+1, we don't need to read threshold back again. I see in testing that it reads back the value that it set, as it should, this value isn't changed again. (The test's monitor.setCollectionThreshold method (MemoryMonitor.java) for setting this ends up calling into MemoryPoolImpl.java) The testing at lines 116 - 121 is checking if we have crossed the threshold, and if so then isExceeded really should be true. It's surviving testing so far! ------------- PR: https://git.openjdk.org/jdk/pull/12823 From ayang at openjdk.org Thu Mar 2 21:57:06 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 2 Mar 2023 21:57:06 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace Message-ID: Simple refactoring of merging two types. Test: tier1-5 ------------- Commit messages: - merge-type Changes: https://git.openjdk.org/jdk/pull/12841/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12841&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303534 Stats: 191 lines in 14 files changed: 27 ins; 122 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/12841.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12841/head:pull/12841 PR: https://git.openjdk.org/jdk/pull/12841 From cjplummer at openjdk.org Thu Mar 2 22:07:30 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 22:07:30 GMT Subject: RFR: 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed with "isCollectionUsageThresholdExceeded() returned true, while threshold = 1 and used = 0" In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 09:20:23 GMT, Kevin Walls wrote: > Test update for an occasional failure, which does not reproduce. > > The test failure in JDK-8303136 is at line 141 in the updated file here. It's the failure where isExceeded is true, but our sampled "used" value is not above the threshold. But while the comment says it's refreshing values, it does not not refresh "used", so there could have been gc/promotion activity which hits the threshold outside of the test's control. Refreshing "used" is the addition here. > > Separately, the code at line 123 in the new file also claims to refresh the values, but it only refreshes the threshold, which we aren't changing. Not making it refresh "used" at that point looks correct, so remove the "if (used >= threshold)" as we have already checked that at line 116. Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12823 From cjplummer at openjdk.org Thu Mar 2 22:14:04 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 22:14:04 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 21:49:39 GMT, Albert Mingkun Yang wrote: > Simple refactoring of merging two types. > > Test: tier1-5 Copyrights need updating in a few files. ------------- PR: https://git.openjdk.org/jdk/pull/12841 From cjplummer at openjdk.org Thu Mar 2 22:18:08 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 22:18:08 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: <63UHTjtrUOVGBTwRt_M4QJ7aqBnuAGqekNTTTl3GM74=.ddedac04-ff87-40b9-9ea7-6b6d26d9d202@github.com> On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. The SA changes (jdk.hotspot.agent) look fine. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12826 From cjplummer at openjdk.org Thu Mar 2 22:29:19 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 22:29:19 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v2] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 20:42:12 GMT, Alex Menkov wrote: >> Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. >> The fix adds test to verify this requirement. >> >> The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > addressed feedback Changes requested by cjplummer (Reviewer). test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 85: > 83: int dupsFound = 0; > 84: // agent.getTypeDataBase() returns HotSpotTypeDataBase (extends BasicTypeDataBase) > 85: BasicTypeDataBase typeDB = (BasicTypeDataBase)(agent.getTypeDataBase()); I don't think the cast is needed. test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 94: > 92: Address vtable = typeDB.vtblForType(t); > 93: if (vtable != null) { > 94: no_vtable++; `no_vtable` is actually tracking the number of Types with a vtable. ------------- PR: https://git.openjdk.org/jdk/pull/12820 From cjplummer at openjdk.org Thu Mar 2 22:29:22 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 22:29:22 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v2] In-Reply-To: <5haZbJW44YKO7x2A-gC4CTl8Tc5LZg_hdofhHfiaNuE=.d88722a8-91a4-48a0-a350-feac97624ed8@github.com> References: <5haZbJW44YKO7x2A-gC4CTl8Tc5LZg_hdofhHfiaNuE=.d88722a8-91a4-48a0-a350-feac97624ed8@github.com> Message-ID: On Thu, 2 Mar 2023 20:42:16 GMT, Alex Menkov wrote: >> test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 112: >> >>> 110: } >>> 111: >>> 112: if (vtable == null && t.getSuperclass() != null) { >> >> Why only log if there is no superclass? > > vmStruct contains a lot of entries which should have vtable == null > I added logging of some stats to the test, on win-x64-fastdebug it reports: > total: 861, no vtable: 503, no_vtable_with_super: 24 > if test would log everything with no vtable, test log is truncated In that case is it worth listing the "no vtable" classes that have a superclass? Are they somehow more interesting than those without a superclass? ------------- PR: https://git.openjdk.org/jdk/pull/12820 From dholmes at openjdk.org Thu Mar 2 23:02:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Mar 2023 23:02:15 GMT Subject: RFR: 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 17:30:07 GMT, Chris Plummer wrote: > attach002a is problem listed under [JDK-8277812](https://bugs.openjdk.org/browse/JDK-8277812), which has been closed as a dup of [JDK-8277573](https://bugs.openjdk.org/browse/JDK-8277573), so its problem list entry should be updated to reflect this. The other issue is that it is currently in the general problem list, but only occurs with -Xcomp, so it needs to be moved to ProblemList-Xcomp.txt. LGTM. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12836 From xuelei at openjdk.org Thu Mar 2 23:13:44 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Thu, 2 Mar 2023 23:13:44 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent [v2] In-Reply-To: References: Message-ID: <5-WfZ5trxZHHBTvflCVvbMXeprs3VCoGSdj2R0lm6NY=.102b1c6c-271d-4f4d-9c28-1a9445789c8e@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.hotspot.agent module. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with two additional commits since the last revision: - one more correction - correct mistakes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12837/files - new: https://git.openjdk.org/jdk/pull/12837/files/4990fb1b..c9c17d9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12837&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12837&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12837.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12837/head:pull/12837 PR: https://git.openjdk.org/jdk/pull/12837 From amenkov at openjdk.org Thu Mar 2 23:16:13 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 2 Mar 2023 23:16:13 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v2] In-Reply-To: References: <5haZbJW44YKO7x2A-gC4CTl8Tc5LZg_hdofhHfiaNuE=.d88722a8-91a4-48a0-a350-feac97624ed8@github.com> Message-ID: On Thu, 2 Mar 2023 22:17:27 GMT, Chris Plummer wrote: >> vmStruct contains a lot of entries which should have vtable == null >> I added logging of some stats to the test, on win-x64-fastdebug it reports: >> total: 861, no vtable: 503, no_vtable_with_super: 24 >> if test would log everything with no vtable, test log is truncated > > In that case is it worth listing the "no vtable" classes that have a superclass? Are they somehow more interesting than those without a superclass? If superclass is set then it's the type which describes VM class (but some JM classes have super == null) Maybe the test should report all Types with CInt: false, CStr: false, JPrimitive: false, Oop: false, Ptr: false Looks like these entries correspond declare_toplevel_type/declare_type in vmStruct ------------- PR: https://git.openjdk.org/jdk/pull/12820 From xuelei at openjdk.org Thu Mar 2 23:30:17 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Thu, 2 Mar 2023 23:30:17 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent [v2] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 19:31:46 GMT, Chris Plummer wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with two additional commits since the last revision: >> >> - one more correction >> - correct mistakes > > src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp line 188: > >> 186: const HRESULT hr = (v); \ >> 187: if (hr != S_OK) { \ >> 188: size_t errmsg_size = new char[strlen(str) + 32; > > This looks broken. I doubt it even compiles. Also, this is win32 so shouldn't be needed for xcode, although it doesn't hurt to fix. Oops, I should avoid this mistake. Thanks you for the catching. The sprintf function is deprecated for security concerns. If it is used, code readers may need to check if the usage is secure or not, which is not really necessary. If I get it right, [the function is deprecated](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170) in Microsoft C compiler as well. But it looks like the deprecation does not trigger a building failure yet. ------------- PR: https://git.openjdk.org/jdk/pull/12837 From amenkov at openjdk.org Thu Mar 2 23:36:14 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 2 Mar 2023 23:36:14 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v2] In-Reply-To: References: Message-ID: <14KvJxnwUQ13Te2xwNpTQjF0YoGDIH_wCh6b7_JS3_U=.f0236e60-d419-4af0-97b8-22cd90a39d8c@github.com> On Thu, 2 Mar 2023 22:20:10 GMT, Chris Plummer wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed feedback > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 85: > >> 83: int dupsFound = 0; >> 84: // agent.getTypeDataBase() returns HotSpotTypeDataBase (extends BasicTypeDataBase) >> 85: BasicTypeDataBase typeDB = (BasicTypeDataBase)(agent.getTypeDataBase()); > > I don't think the cast is needed. agent.getTypeDataBase() returns TypeDataBase, we need BasicTypeDataBase ------------- PR: https://git.openjdk.org/jdk/pull/12820 From cjplummer at openjdk.org Fri Mar 3 00:44:04 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 00:44:04 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent [v2] In-Reply-To: <5-WfZ5trxZHHBTvflCVvbMXeprs3VCoGSdj2R0lm6NY=.102b1c6c-271d-4f4d-9c28-1a9445789c8e@github.com> References: <5-WfZ5trxZHHBTvflCVvbMXeprs3VCoGSdj2R0lm6NY=.102b1c6c-271d-4f4d-9c28-1a9445789c8e@github.com> Message-ID: <0zO18zSkKidhJA1xSD8iu3_1NZPebFNUEPBmPlVOh2w=.22ef3d8e-97a9-4073-887f-e1ed0d752009@github.com> On Thu, 2 Mar 2023 23:13:44 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.hotspot.agent module. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with two additional commits since the last revision: > > - one more correction > - correct mistakes You will need to update the copyright in src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c ------------- Changes requested by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12837 From cjplummer at openjdk.org Fri Mar 3 00:47:04 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 00:47:04 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v2] In-Reply-To: <14KvJxnwUQ13Te2xwNpTQjF0YoGDIH_wCh6b7_JS3_U=.f0236e60-d419-4af0-97b8-22cd90a39d8c@github.com> References: <14KvJxnwUQ13Te2xwNpTQjF0YoGDIH_wCh6b7_JS3_U=.f0236e60-d419-4af0-97b8-22cd90a39d8c@github.com> Message-ID: On Thu, 2 Mar 2023 23:32:47 GMT, Alex Menkov wrote: >> test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 85: >> >>> 83: int dupsFound = 0; >>> 84: // agent.getTypeDataBase() returns HotSpotTypeDataBase (extends BasicTypeDataBase) >>> 85: BasicTypeDataBase typeDB = (BasicTypeDataBase)(agent.getTypeDataBase()); >> >> I don't think the cast is needed. > > agent.getTypeDataBase() returns TypeDataBase, we need BasicTypeDataBase Your comment seems to indicate otherwise, although I think what you meant is that the declared return type for agent.getTypeDataBase() is TypeDataBase, but it actually returns an object of type HotSpotTypeDataBase, which is a subclass of BasicTypeDataBase. You should make that more clear. ------------- PR: https://git.openjdk.org/jdk/pull/12820 From yyang at openjdk.org Fri Mar 3 02:12:18 2023 From: yyang at openjdk.org (Yi Yang) Date: Fri, 3 Mar 2023 02:12:18 GMT Subject: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v2] In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 03:01:22 GMT, Serguei Spitsyn wrote: >> Yi Yang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: >> >> - separate renaming >> - 8299518: HotSpotVirtualMachine shared code across different platforms > > I like the approach in general. > Also, I agree with David on his comments, especially on the renaming. > The abstract methods `readImpl()` and `closeImpl()` is better to name as `read()` and `close()`. @sspitsyn @dholmes-ora @turbanoff May I ask your help to review this patch? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11823 From amenkov at openjdk.org Fri Mar 3 02:28:32 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 3 Mar 2023 02:28:32 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v3] In-Reply-To: References: Message-ID: > Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. > The fix adds test to verify this requirement. > > The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Chris's feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12820/files - new: https://git.openjdk.org/jdk/pull/12820/files/ceac098b..903be28e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12820&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12820&range=01-02 Stats: 20 lines in 1 file changed: 6 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/12820.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12820/head:pull/12820 PR: https://git.openjdk.org/jdk/pull/12820 From amenkov at openjdk.org Fri Mar 3 02:33:15 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 3 Mar 2023 02:33:15 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v2] In-Reply-To: References: <14KvJxnwUQ13Te2xwNpTQjF0YoGDIH_wCh6b7_JS3_U=.f0236e60-d419-4af0-97b8-22cd90a39d8c@github.com> Message-ID: On Fri, 3 Mar 2023 00:44:31 GMT, Chris Plummer wrote: >> agent.getTypeDataBase() returns TypeDataBase, we need BasicTypeDataBase > > Your comment seems to indicate otherwise, although I think what you meant is that the declared return type for agent.getTypeDataBase() is TypeDataBase, but it actually returns an object of type HotSpotTypeDataBase, which is a subclass of BasicTypeDataBase. You should make that more clear. Updated the comment. Hope it's clearer now ------------- PR: https://git.openjdk.org/jdk/pull/12820 From amenkov at openjdk.org Fri Mar 3 02:33:19 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 3 Mar 2023 02:33:19 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v2] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 22:24:35 GMT, Chris Plummer wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed feedback > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 94: > >> 92: Address vtable = typeDB.vtblForType(t); >> 93: if (vtable != null) { >> 94: no_vtable++; > > `no_vtable` is actually tracking the number of Types with a vtable. Changed reported stats ------------- PR: https://git.openjdk.org/jdk/pull/12820 From amenkov at openjdk.org Fri Mar 3 02:33:20 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 3 Mar 2023 02:33:20 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v3] In-Reply-To: References: <5haZbJW44YKO7x2A-gC4CTl8Tc5LZg_hdofhHfiaNuE=.d88722a8-91a4-48a0-a350-feac97624ed8@github.com> Message-ID: On Thu, 2 Mar 2023 23:13:36 GMT, Alex Menkov wrote: >> In that case is it worth listing the "no vtable" classes that have a superclass? Are they somehow more interesting than those without a superclass? > > If superclass is set then it's the type which describes VM class (but some JM classes have super == null) > Maybe the test should report all Types with CInt: false, CStr: false, JPrimitive: false, Oop: false, Ptr: false > Looks like these entries correspond declare_toplevel_type/declare_type in vmStruct Updated criteria to log classes without vtable ------------- PR: https://git.openjdk.org/jdk/pull/12820 From amenkov at openjdk.org Fri Mar 3 02:33:22 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 3 Mar 2023 02:33:22 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v3] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 20:50:36 GMT, Alex Menkov wrote: >> test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 118: >> >>> 116: + ", JPrimitive: " + t.isJavaPrimitiveType() >>> 117: + ", Oop: " + t.isOopType() >>> 118: + ", Ptr: " + t.isPointerType()); >> >> It appears that these always print "false". Are they worth having? > > It reports all available info about the Type. As these are "suspicious" types I think it make sense to provide more info removed this property as by the updated condition logged classes have all "false" ------------- PR: https://git.openjdk.org/jdk/pull/12820 From xuelei at openjdk.org Fri Mar 3 04:19:47 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 3 Mar 2023 04:19:47 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent [v3] In-Reply-To: References: Message-ID: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.hotspot.agent module. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12837/files - new: https://git.openjdk.org/jdk/pull/12837/files/c9c17d9e..cf0fc0d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12837&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12837&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12837.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12837/head:pull/12837 PR: https://git.openjdk.org/jdk/pull/12837 From dholmes at openjdk.org Fri Mar 3 05:07:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Mar 2023 05:07:16 GMT Subject: RFR: 8303151: DCmd framework cleanups Message-ID: Whilst working on the DCmd code I noticed two items that could be cleaned up: 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead of inside it. 2. The "extension" mechanism to define external DCmds (as added by [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for `UnlockCommercialFeatures`) is no longer needed. Testing: tiers 1-3 Thanks ------------- Commit messages: - 8303151: DCmd framework cleanups Changes: https://git.openjdk.org/jdk/pull/12847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12847&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303151 Stats: 13 lines in 3 files changed: 1 ins; 11 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12847/head:pull/12847 PR: https://git.openjdk.org/jdk/pull/12847 From dholmes at openjdk.org Fri Mar 3 05:30:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Mar 2023 05:30:14 GMT Subject: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v7] In-Reply-To: References: Message-ID: On Tue, 28 Feb 2023 06:17:29 GMT, Yi Yang wrote: >> harmless refactor to share code across different platforms of VirtualMachineImpl: >> 1. Shared code to process command response after requesting a command execution >> 2. Read functionality in SocketInputStream can be reused > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > fd set -1 Nothing further from me. Thanks. src/jdk.attach/share/classes/sun/tools/attach/HotSpotVirtualMachine.java line 375: > 373: * Utility method to process the completion status after command execution. > 374: * If we get IOE during previous command execution, delay throwing it until > 375: * completion status have been read. Nit: s/have/has/ ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11823 From yyang at openjdk.org Fri Mar 3 06:07:55 2023 From: yyang at openjdk.org (Yi Yang) Date: Fri, 3 Mar 2023 06:07:55 GMT Subject: RFR: 8299518: HotSpotVirtualMachine shared code across different platforms [v8] In-Reply-To: References: Message-ID: > harmless refactor to share code across different platforms of VirtualMachineImpl: > 1. Shared code to process command response after requesting a command execution > 2. Read functionality in SocketInputStream can be reused Yi Yang has updated the pull request incrementally with one additional commit since the last revision: have/has ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11823/files - new: https://git.openjdk.org/jdk/pull/11823/files/6033fcc4..5dabc62e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11823&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11823.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11823/head:pull/11823 PR: https://git.openjdk.org/jdk/pull/11823 From yyang at openjdk.org Fri Mar 3 06:20:10 2023 From: yyang at openjdk.org (Yi Yang) Date: Fri, 3 Mar 2023 06:20:10 GMT Subject: RFR: 8303151: DCmd framework cleanups In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 04:59:44 GMT, David Holmes wrote: > Whilst working on the DCmd code I noticed two items that could be cleaned up: > > 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead of inside it. > > 2. The "extension" mechanism to define external DCmds (as added by [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for `UnlockCommercialFeatures`) is no longer needed. > > Testing: tiers 1-3 > > Thanks Is it possible to remove DCmdRegistrant too? Maybe `void DCmdFactory::register_dcmds` or `void register_dcmds` ------------- PR: https://git.openjdk.org/jdk/pull/12847 From aivanov at openjdk.org Fri Mar 3 08:28:19 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 3 Mar 2023 08:28:19 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. Looks good to me. I looked through all the changes, paying more attention to the client area. src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java line 257: > 255: > 256: /** > 257: * @return true iff the BSM method type exactly matches I assume ?iff? should ?if?? src/jdk.compiler/share/classes/com/sun/tools/javac/code/Types.java line 2866: > 2864: * Merge multiple abstract methods. The preferred method is a method that is a subsignature > 2865: * of all the other signatures and whose return type is more specific {@link MostSpecificReturnCheck}. > 2866: * The resulting preferred method has a thrown clause that is the intersection of the merged Is it ??has a {@code throws} clause??? ------------- Marked as reviewed by aivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/12826 From prappo at openjdk.org Fri Mar 3 09:41:06 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Fri, 3 Mar 2023 09:41:06 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 16:23:17 GMT, Alexey Ivanov wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java line 257: > >> 255: >> 256: /** >> 257: * @return true iff the BSM method type exactly matches > > I assume ?iff? should ?if?? Here and elsewhere in this file "iff" might mean [if and only if](https://en.wikipedia.org/wiki/If_and_only_if), which would make sense. (FWIW, there are a few hundred occurrences of the word "iff" in src.) @cl4es (Claes Redestad), as the author of those lines would you like to chime in? Since Claes might read this, I note that when I changed unsupported `{@see}` to `{@link}` thoughtout this file, my IDE could not resolve one of the links: `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,Class,MethodType,MethodHandle,MethodType)` While there's a similarly-name method with slightly different parameters, I refrained from using it: `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,MethodType,MethodType,MethodHandle,MethodType)`. ------------- PR: https://git.openjdk.org/jdk/pull/12826 From prappo at openjdk.org Fri Mar 3 09:44:13 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Fri, 3 Mar 2023 09:44:13 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: <5TgKeBVz0u1hCa1qOiC7Y46DJvUtDIsDa1wv2I4tAX8=.8575f968-0685-450d-8d77-16523cd7531a@github.com> On Fri, 3 Mar 2023 08:15:49 GMT, Alexey Ivanov wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > src/jdk.compiler/share/classes/com/sun/tools/javac/code/Types.java line 2866: > >> 2864: * Merge multiple abstract methods. The preferred method is a method that is a subsignature >> 2865: * of all the other signatures and whose return type is more specific {@link MostSpecificReturnCheck}. >> 2866: * The resulting preferred method has a thrown clause that is the intersection of the merged > > Is it ??has a {@code throws} clause??? Thanks! I'll add this to a separate PR. ------------- PR: https://git.openjdk.org/jdk/pull/12826 From redestad at openjdk.org Fri Mar 3 10:12:13 2023 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 3 Mar 2023 10:12:13 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: <-U8YFFuXm_hMf-bY1AVCRauRrE-fRYRxrx_yf38ZL1A=.d50884c5-cc4b-489a-b817-828faf876c76@github.com> On Fri, 3 Mar 2023 09:38:13 GMT, Pavel Rappo wrote: >> src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java line 257: >> >>> 255: >>> 256: /** >>> 257: * @return true iff the BSM method type exactly matches >> >> I assume ?iff? should ?if?? > > Here and elsewhere in this file "iff" might mean [if and only if](https://en.wikipedia.org/wiki/If_and_only_if), which would make sense. (FWIW, there are a few hundred occurrences of the word "iff" in src.) > > @cl4es (Claes Redestad), as the author of those lines would you like to chime in? > > Since Claes might read this, I note that when I changed unsupported `{@see}` to `{@link}` thoughtout this file, my IDE could not resolve one of the links: `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,Class,MethodType,MethodHandle,MethodType)` > > While there's a similarly-name method with slightly different parameters, I refrained from using it: > `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,MethodType,MethodType,MethodHandle,MethodType)`. Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. As @pavelrappo points out it's a relatively common occurrence in the OpenJDK sources, though perhaps not in the public javadocs. Perhaps a bit pretentious, but mostly a terse way to say "return true if the BSM method type exactly matches X, otherwise false". The broken link stems from the fact that the method I was targeting (a way to use condy for lambda proxy singletons rather than a `MethodHandle.constant`) was never integrated. We'll look at either getting that done (@briangoetz suggested the time might be ready for it) or remove this currently pointless static bootstrap specialization test. ------------- PR: https://git.openjdk.org/jdk/pull/12826 From aivanov at openjdk.org Fri Mar 3 11:34:16 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 3 Mar 2023 11:34:16 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: <-U8YFFuXm_hMf-bY1AVCRauRrE-fRYRxrx_yf38ZL1A=.d50884c5-cc4b-489a-b817-828faf876c76@github.com> References: <-U8YFFuXm_hMf-bY1AVCRauRrE-fRYRxrx_yf38ZL1A=.d50884c5-cc4b-489a-b817-828faf876c76@github.com> Message-ID: On Fri, 3 Mar 2023 10:09:27 GMT, Claes Redestad wrote: > Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. I've never come across it before. With your explanations, it makes perfect sense. ------------- PR: https://git.openjdk.org/jdk/pull/12826 From kevinw at openjdk.org Fri Mar 3 11:46:49 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 3 Mar 2023 11:46:49 GMT Subject: RFR: 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. [v2] In-Reply-To: References: Message-ID: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> > Deprecate the Java Management Extension (JMX) Subject Delegation feature for removal in a future release. > > Given no known usage, there is no replacement feature for JMX Subject Delegation. > > CSR is https://bugs.openjdk.org/browse/JDK-8298967 Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - deprecation text update - Revert "RMIConnection throw comments" This reverts commit aceb4fe44189245ac702f0c74c2bb1100a6d17fa. - Merge remote-tracking branch 'upstream/master' into Deprecate_SubjectDelegation - RMIConnection throw comments - 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11880/files - new: https://git.openjdk.org/jdk/pull/11880/files/80c5c02d..b1516566 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11880&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11880&range=00-01 Stats: 149353 lines in 4665 files changed: 68677 ins; 36188 del; 44488 mod Patch: https://git.openjdk.org/jdk/pull/11880.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11880/head:pull/11880 PR: https://git.openjdk.org/jdk/pull/11880 From kevinw at openjdk.org Fri Mar 3 11:46:52 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 3 Mar 2023 11:46:52 GMT Subject: RFR: 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. In-Reply-To: References: Message-ID: <_tUVqc_XHmFvGXlhcWYwVT-nbMCJ4ZabWXH9coJvlKU=.4ed94427-d877-49ed-b0e3-fb46a9da6ae6@github.com> On Fri, 6 Jan 2023 12:02:37 GMT, Kevin Walls wrote: > Deprecate the Java Management Extension (JMX) Subject Delegation feature for removal in a future release. > > Given no known usage, there is no replacement feature for JMX Subject Delegation. > > CSR is https://bugs.openjdk.org/browse/JDK-8298967 Updated to sync deprecation text in JMXConnector.java with the updated text in the CSR. ------------- PR: https://git.openjdk.org/jdk/pull/11880 From kevinw at openjdk.org Fri Mar 3 12:23:14 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 3 Mar 2023 12:23:14 GMT Subject: RFR: 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 17:30:07 GMT, Chris Plummer wrote: > attach002a is problem listed under [JDK-8277812](https://bugs.openjdk.org/browse/JDK-8277812), which has been closed as a dup of [JDK-8277573](https://bugs.openjdk.org/browse/JDK-8277573), so its problem list entry should be updated to reflect this. The other issue is that it is currently in the general problem list, but only occurs with -Xcomp, so it needs to be moved to ProblemList-Xcomp.txt. Marked as reviewed by kevinw (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/12836 From ayang at openjdk.org Fri Mar 3 12:30:38 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 3 Mar 2023 12:30:38 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace [v2] In-Reply-To: References: Message-ID: > Simple refactoring of merging two types. > > Test: tier1-5 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: copyright-year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12841/files - new: https://git.openjdk.org/jdk/pull/12841/files/dc9f901f..da1931a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12841&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12841&range=00-01 Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12841.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12841/head:pull/12841 PR: https://git.openjdk.org/jdk/pull/12841 From jsjolen at openjdk.org Fri Mar 3 13:31:14 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 3 Mar 2023 13:31:14 GMT Subject: RFR: 8303151: DCmd framework cleanups In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 04:59:44 GMT, David Holmes wrote: > Whilst working on the DCmd code I noticed two items that could be cleaned up: > > 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead of inside it. > > 2. The "extension" mechanism to define external DCmds (as added by [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for `UnlockCommercialFeatures`) is no longer needed. > > Testing: tiers 1-3 > > Thanks LGTM, thanks for the clean up. ------------- Marked as reviewed by jsjolen (Committer). PR: https://git.openjdk.org/jdk/pull/12847 From rriggs at openjdk.org Fri Mar 3 17:19:08 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 3 Mar 2023 17:19:08 GMT Subject: RFR: JDK-8303587 Remove VMOutOfMemoryError001 test from the problem list after 8303198 Message-ID: Remove VMOutOfMemoryException001.java from the problem list, after JDK-8303198. The logging of Runtime.exit interfered with out-of-memory exception handling in this test. Making the logging more robust in JDK-8303198 by handling exceptions restores the conditions expected by this test. ------------- Commit messages: - JDK-8303587 Remove VMOutOfMemoryError001 test from the problem list after 8303198 Changes: https://git.openjdk.org/jdk/pull/12859/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12859&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303587 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12859.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12859/head:pull/12859 PR: https://git.openjdk.org/jdk/pull/12859 From cjplummer at openjdk.org Fri Mar 3 17:27:07 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 17:27:07 GMT Subject: RFR: JDK-8303587 Remove VMOutOfMemoryError001 test from the problem list after 8303198 In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 16:40:41 GMT, Roger Riggs wrote: > Remove VMOutOfMemoryException001.java from the problem list, after JDK-8303198. > > The logging of Runtime.exit interfered with out-of-memory exception handling in this test. > Making the logging more robust in JDK-8303198 by handling exceptions restores the conditions expected by this test. Approved and trivial. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12859 From pchilanomate at openjdk.org Fri Mar 3 17:33:10 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 3 Mar 2023 17:33:10 GMT Subject: RFR: 8303242: ThreadMXBean issues with virtual threads [v4] In-Reply-To: References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: On Thu, 2 Mar 2023 08:18:03 GMT, Alan Bateman wrote: >> This PR covers a number of issues with j.l.management.ThreadMXBean, and the JDK-specific extension c.s.management.ThreadMXBean, when there are virtual threads in use. >> >> As background, ThreadMXBean was re-specified in Java 19 to support the monitoring and management of platform threads. It does not support virtual threads as their potential number, and the need to find a thread by id, does not make sense for this API. At the same time, JDK 19 introduced an alternative implementation of virtual threads for Zero and ports without continuations support in the VM. This alternative implementation of virtual threads means a JavaThread per virtual thread and so requires filtering to ensure that the API behaves as specified. For the initial implementation, the filtering was done in the ThreadMXBean implementation. That works for most functions but not for getThreadXXXTime(long[]) and getThreadAllocatedBytes(long[]) where the filtering needs to be pushed down to the management code. >> >> The changes in this PR move the filtering to the management functions (jmm_XXX) so they only return information about platform threads. It also fixes ThreadMXBean.getCurrentThreadCpuTime and getCurrentThreadUserTime to not throw UOE when CPU time measurement from a platform thread is supported. There are some small adjustments to the API docs (see linked CSR). Test coverage is expanded as we didn't include tests for c.s.management.ThreadMXBean with virtual threads in JDK 19. >> >> Testing tier1-3 (jdk_management test group is in test/jdk/:tier3), plus sanity checking that --with-jvm-variants=minimal builds as some of this code is not compiled in with minimal VM builds. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Tweak javadoc to avoid listing too many conditions in @return description > - Merge > - Update isXXXThreadCpuTimeSupported descriptions > - Clarify Thread CPU time seciton of spec > - Merge > - Fix minimal build > - Fix minimal build > - Initial commit Looks good to me. ------------- Marked as reviewed by pchilanomate (Reviewer). PR: https://git.openjdk.org/jdk/pull/12762 From cjplummer at openjdk.org Fri Mar 3 17:37:05 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 17:37:05 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent [v3] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 04:19:47 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.hotspot.agent module. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > update copyright year Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12837 From cjplummer at openjdk.org Fri Mar 3 17:41:14 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 17:41:14 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace [v2] In-Reply-To: References: Message-ID: <8gVdjyz3i-jfbjjlZ9Cw84YNqZzqDLmn9gmF68VyLbs=.074e5e4a-1564-4244-8df2-cb97974ce68d@github.com> On Fri, 3 Mar 2023 12:30:38 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of merging two types. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > copyright-year SA changes look good. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12841 From cjplummer at openjdk.org Fri Mar 3 17:41:25 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 17:41:25 GMT Subject: Integrated: 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 17:30:07 GMT, Chris Plummer wrote: > attach002a is problem listed under [JDK-8277812](https://bugs.openjdk.org/browse/JDK-8277812), which has been closed as a dup of [JDK-8277573](https://bugs.openjdk.org/browse/JDK-8277573), so its problem list entry should be updated to reflect this. The other issue is that it is currently in the general problem list, but only occurs with -Xcomp, so it needs to be moved to ProblemList-Xcomp.txt. This pull request has now been integrated. Changeset: 29ee7c3b Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/29ee7c3b70ded8cd124ca5b4a38a2aee7c39068b Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java Reviewed-by: dholmes, kevinw ------------- PR: https://git.openjdk.org/jdk/pull/12836 From cjplummer at openjdk.org Fri Mar 3 17:54:17 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 17:54:17 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v3] In-Reply-To: References: Message-ID: <87g_McpxB9zTv6wUu_lsATbg31o6daWx-kUkjU_wHQU=.b514221f-e325-4ff7-9be9-ce4b55445a13@github.com> On Fri, 3 Mar 2023 02:28:32 GMT, Alex Menkov wrote: >> Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. >> The fix adds test to verify this requirement. >> >> The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Chris's feedback Looks good. Thanks for adding this. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12820 From xuelei at openjdk.org Fri Mar 3 18:15:37 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 3 Mar 2023 18:15:37 GMT Subject: Integrated: 8303527: update for deprecated sprintf for jdk.hotspot.agent In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 19:17:40 GMT, Xue-Lei Andrew Fan wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.hotspot.agent module. > > Thanks, > Xuelei This pull request has now been integrated. Changeset: a50dc67a Author: Xue-Lei Andrew Fan URL: https://git.openjdk.org/jdk/commit/a50dc67a4f480fcf7183d11094d507d80b19d941 Stats: 6 lines in 2 files changed: 1 ins; 0 del; 5 mod 8303527: update for deprecated sprintf for jdk.hotspot.agent Reviewed-by: cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/12837 From cjplummer at openjdk.org Fri Mar 3 18:22:15 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 18:22:15 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent [v3] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 04:19:47 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.hotspot.agent module. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > update copyright year @XueleiFan In the future please get two reviews for all hotspot (and serviceability) changes. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12837 From xuelei at openjdk.org Fri Mar 3 18:22:15 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 3 Mar 2023 18:22:15 GMT Subject: RFR: 8303527: update for deprecated sprintf for jdk.hotspot.agent [v3] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 18:18:02 GMT, Chris Plummer wrote: > @XueleiFan In the future please get two reviews for all hotspot (and serviceability) changes. Thanks. Thank you for pointing it out, and the review. ------------- PR: https://git.openjdk.org/jdk/pull/12837 From rriggs at openjdk.org Fri Mar 3 18:31:25 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 3 Mar 2023 18:31:25 GMT Subject: Integrated: JDK-8303587 Remove VMOutOfMemoryError001 test from the problem list after 8303198 In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 16:40:41 GMT, Roger Riggs wrote: > Remove VMOutOfMemoryException001.java from the problem list, after JDK-8303198. > > The logging of Runtime.exit interfered with out-of-memory exception handling in this test. > Making the logging more robust in JDK-8303198 by handling exceptions restores the conditions expected by this test. This pull request has now been integrated. Changeset: 99443142 Author: Roger Riggs URL: https://git.openjdk.org/jdk/commit/99443142cc8280a1fc896981ef3d0ac27365d035 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8303587: Remove VMOutOfMemoryError001 test from the problem list after 8303198 Reviewed-by: cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/12859 From cjplummer at openjdk.org Fri Mar 3 20:40:57 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 20:40:57 GMT Subject: RFR: 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" Message-ID: The test failure is caused by the arrival of unexpected ThreadStartEvents, which mess up the debugger side. The events are for threads we normally only see getting created when using virtual threads, such as carrier threads and the VirtualThread-unparker thread. Theoretically this issue could happen without virtual threads due to other VM threads starting up such as Common-Cleaner, but we haven't seen it fail for that reason yet. The test is testing proper thread suspension for ThreadStartEvent using each of the 3 suspension policy types. The debuggee creates a sequence of 3 debuggee threads, each one's timing coordinated with some complicated synchronization with the debugger using breakpoints and the setting of fields in the debuggee (and careful placement of suspend/resume in the debugger). The ThreadStartRequests that the debugger sets up always use a "count filter" of 1, which means the requests are good for delivering exactly 1 ThreadStartEvent, and any that come after the first will get filtered out. So when an an unexpected ThreadStartEvent arrives for something like a carrier thread, this prematurely moves the debugger on to the next step, and the synchronization with the debuggee gets messed up. The first step in fixing this test was to remove the count filter, so the request can handle any number of ThreadStartEvents. The next step was then fixing the test library code in EventHandler.java so it would filter out any undesired ThreadStartEvents, so the test just ends up getting one event, and always for the thread it is expecting. There are a few parts to this. One is improving EventFilters.filter() to filter out more threads that tend to be created during VM startup, including carrier threads and the VirtualThread-unparker thread. It was necessary to add some calls EventFilters.filter() from EventHandler. This was done by adding a ThreadStartEvent listener for the "spurious" thread starts (those the test debuggee does not create). This listener is added by waitForRequestedEventCommon(), which is indirectly called by the test when is calls waitForRequestedEventSet(). There is a also 2nd place where the ThreadStartEvent listener for "spurious" threads is needed. It is now also installed with the default listeners that are always in place. It is needed when the test is not actually waiting for a ThreadStartEvent, but is waiting for a BreakpointEvent. waitForRequestedEventCommon() is not used in this case (so none of its listeners are installed), but the default listeners are always in place and can be used to filter these ThreadStartEvents. Note this filter will also be in place when calling waitForRequestedEventCommon(), but we can't realy on it when waitForRequestedEventCommon() is used for ThreadStartEvents because the spurious ThreadStartEvent will be seen and returned before we ever get to the default filter. So we actually end up with this ThreadStartEvent listener installed twice during waitForRequestedEventCommon(). I did a bit of cleanup on the test, mostly renaming of threads and ThreadStartRequests so they are easier to match up with the iteration # we use in both the debuggee and debugger (0, 1, and 2). The only real change in the test itself is removing the filter count, and verifying that the ThreadStartEvent is for the expected thread. ------------- Commit messages: - Filter out spurious ThreadStartEvents. Changes: https://git.openjdk.org/jdk/pull/12861/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12861&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289765 Stats: 96 lines in 4 files changed: 73 ins; 1 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/12861.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12861/head:pull/12861 PR: https://git.openjdk.org/jdk/pull/12861 From alanb at openjdk.org Sat Mar 4 07:37:23 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 4 Mar 2023 07:37:23 GMT Subject: Integrated: 8303242: ThreadMXBean issues with virtual threads In-Reply-To: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> References: <_qWp1Z5LY9I2q6Wy9Zdyt-m8V9D_502fyM4X5iUJi_0=.5083a667-5c61-4bc4-9961-98d689b80b7a@github.com> Message-ID: On Mon, 27 Feb 2023 12:23:09 GMT, Alan Bateman wrote: > This PR covers a number of issues with j.l.management.ThreadMXBean, and the JDK-specific extension c.s.management.ThreadMXBean, when there are virtual threads in use. > > As background, ThreadMXBean was re-specified in Java 19 to support the monitoring and management of platform threads. It does not support virtual threads as their potential number, and the need to find a thread by id, does not make sense for this API. At the same time, JDK 19 introduced an alternative implementation of virtual threads for Zero and ports without continuations support in the VM. This alternative implementation of virtual threads means a JavaThread per virtual thread and so requires filtering to ensure that the API behaves as specified. For the initial implementation, the filtering was done in the ThreadMXBean implementation. That works for most functions but not for getThreadXXXTime(long[]) and getThreadAllocatedBytes(long[]) where the filtering needs to be pushed down to the management code. > > The changes in this PR move the filtering to the management functions (jmm_XXX) so they only return information about platform threads. It also fixes ThreadMXBean.getCurrentThreadCpuTime and getCurrentThreadUserTime to not throw UOE when CPU time measurement from a platform thread is supported. There are some small adjustments to the API docs (see linked CSR). Test coverage is expanded as we didn't include tests for c.s.management.ThreadMXBean with virtual threads in JDK 19. > > Testing tier1-3 (jdk_management test group is in test/jdk/:tier3), plus sanity checking that --with-jvm-variants=minimal builds as some of this code is not compiled in with minimal VM builds. This pull request has now been integrated. Changeset: 629a9053 Author: Alan Bateman URL: https://git.openjdk.org/jdk/commit/629a9053f072a3d8406b923f8fa8ab7056a1ab8d Stats: 531 lines in 8 files changed: 285 ins; 138 del; 108 mod 8303242: ThreadMXBean issues with virtual threads Reviewed-by: mchung, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/12762 From dholmes at openjdk.org Sat Mar 4 08:50:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 4 Mar 2023 08:50:44 GMT Subject: RFR: 8303151: DCmd framework cleanups [v2] In-Reply-To: References: Message-ID: > Whilst working on the DCmd code I noticed two items that could be cleaned up: > > 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead of inside it. > > 2. The "extension" mechanism to define external DCmds (as added by [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for `UnlockCommercialFeatures`) is no longer needed. > > Testing: tiers 1-3 > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12847/files - new: https://git.openjdk.org/jdk/pull/12847/files/4e8d6b56..7f29eba9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12847&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12847&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12847/head:pull/12847 PR: https://git.openjdk.org/jdk/pull/12847 From dholmes at openjdk.org Sat Mar 4 08:50:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 4 Mar 2023 08:50:44 GMT Subject: RFR: 8303151: DCmd framework cleanups In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 06:17:14 GMT, Yi Yang wrote: > Is it possible to remove DCmdRegistrant too? Thanks for looking at this. Yes this would be possible I think. Not sure why the `DCmdRegistrant` class was considered necessary ... probably to support alternative implementations of `register_dcmd_ext()`. I think it could reside in the DCmd class. ------------- PR: https://git.openjdk.org/jdk/pull/12847 From dholmes at openjdk.org Sat Mar 4 08:50:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 4 Mar 2023 08:50:44 GMT Subject: RFR: 8303151: DCmd framework cleanups [v2] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 13:27:58 GMT, Johan Sj?len wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment > > LGTM, thanks for the clean up. Thanks for the review @jdksjolen ! ------------- PR: https://git.openjdk.org/jdk/pull/12847 From stuefe at openjdk.org Sat Mar 4 09:16:11 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 4 Mar 2023 09:16:11 GMT Subject: RFR: 8303151: DCmd framework cleanups [v2] In-Reply-To: References: Message-ID: On Sat, 4 Mar 2023 08:50:44 GMT, David Holmes wrote: >> Whilst working on the DCmd code I noticed two items that could be cleaned up: >> >> 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead of inside it. >> >> 2. The "extension" mechanism to define external DCmds (as added by [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for `UnlockCommercialFeatures`) is no longer needed. >> >> Testing: tiers 1-3 >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment +1 ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/12847 From dholmes at openjdk.org Sun Mar 5 05:46:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 5 Mar 2023 05:46:37 GMT Subject: RFR: 8303151: DCmd framework cleanups [v3] In-Reply-To: References: Message-ID: <6wxCXp_ubcVulb9p7XWSmHB7fwTswyY2SALaKs_zyjs=.c9a28950-968a-4c52-9a86-ac33ba08e80b@github.com> > Whilst working on the DCmd code I noticed two items that could be cleaned up: > > 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead of inside it. > > 2. The "extension" mechanism to define external DCmds (as added by [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for `UnlockCommercialFeatures`) is no longer needed. > > Testing: tiers 1-3 > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Relocate regoster_dcmds to DCmd class and get rid of DCmdRegistrat class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12847/files - new: https://git.openjdk.org/jdk/pull/12847/files/7f29eba9..1b1afce5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12847&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12847&range=01-02 Stats: 19 lines in 3 files changed: 5 ins; 12 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12847/head:pull/12847 PR: https://git.openjdk.org/jdk/pull/12847 From dholmes at openjdk.org Sun Mar 5 05:46:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 5 Mar 2023 05:46:37 GMT Subject: RFR: 8303151: DCmd framework cleanups [v2] In-Reply-To: References: Message-ID: On Sat, 4 Mar 2023 09:12:54 GMT, Thomas Stuefe wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment > > +1 Thanks for the review @tstuefe Re-running test builds after latest adjustment. ------------- PR: https://git.openjdk.org/jdk/pull/12847 From clanger at openjdk.org Sun Mar 5 20:46:24 2023 From: clanger at openjdk.org (Christoph Langer) Date: Sun, 5 Mar 2023 20:46:24 GMT Subject: RFR: JDK-8302320: AsyncGetCallTrace obtains too few frames in sanity test [v6] In-Reply-To: References: Message-ID: On Tue, 21 Feb 2023 08:58:50 GMT, Johannes Bechberger wrote: >> Extends the existing AsyncGetCallTrace test case and fixes the issue. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Update full name > [db483a3](https://github.com/openjdk/jdk/commit/db483a38a815f85bd9668749674b5f0f6e4b27b4). You need to do that on the [commit](https://github.com/openjdk/jdk/commit/db483a38a815f85bd9668749674b5f0f6e4b27b4), not on the PR. However, not sure whether your author role is sufficient and everything Github is set up correctly... ------------- PR: https://git.openjdk.org/jdk/pull/12535 From yyang at openjdk.org Mon Mar 6 02:05:27 2023 From: yyang at openjdk.org (Yi Yang) Date: Mon, 6 Mar 2023 02:05:27 GMT Subject: Integrated: 8299518: HotSpotVirtualMachine shared code across different platforms In-Reply-To: References: Message-ID: On Tue, 3 Jan 2023 09:34:55 GMT, Yi Yang wrote: > harmless refactor to share code across different platforms of VirtualMachineImpl: > 1. Shared code to process command response after requesting a command execution > 2. Read functionality in SocketInputStream can be reused This pull request has now been integrated. Changeset: 10d6a8e6 Author: Yi Yang URL: https://git.openjdk.org/jdk/commit/10d6a8e66a911d876239e44afbd76f7faf660cc3 Stats: 396 lines in 5 files changed: 94 ins; 240 del; 62 mod 8299518: HotSpotVirtualMachine shared code across different platforms Reviewed-by: cjplummer, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/11823 From yyang at openjdk.org Mon Mar 6 02:09:15 2023 From: yyang at openjdk.org (Yi Yang) Date: Mon, 6 Mar 2023 02:09:15 GMT Subject: RFR: 8303151: DCmd framework cleanups [v3] In-Reply-To: <6wxCXp_ubcVulb9p7XWSmHB7fwTswyY2SALaKs_zyjs=.c9a28950-968a-4c52-9a86-ac33ba08e80b@github.com> References: <6wxCXp_ubcVulb9p7XWSmHB7fwTswyY2SALaKs_zyjs=.c9a28950-968a-4c52-9a86-ac33ba08e80b@github.com> Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Sun, 5 Mar 2023 05:46:37 GMT, David Holmes wrote: >> Whilst working on the DCmd code I noticed two items that could be cleaned up: >> >> 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead of inside it. >> >> 2. The "extension" mechanism to define external DCmds (as added by [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for `UnlockCommercialFeatures`) is no longer needed. >> >> Testing: tiers 1-3 >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Relocate regoster_dcmds to DCmd class and get rid of DCmdRegistrat class Thanks for doing this. Looks good to me. ------------- Marked as reviewed by yyang (Committer). PR: https://git.openjdk.org/jdk/pull/12847 From cjplummer at openjdk.org Mon Mar 6 04:36:03 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 6 Mar 2023 04:36:03 GMT Subject: RFR: 8303630: Move nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java back to general problem list Message-ID: nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java was just moved to the -Xcomp problem list, but it looks like it can happen without -Xcomp, although it looks like JFR is required in this case. In any case, it needs to move back to the general problem list. ------------- Commit messages: - Move nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java back to general problem list Changes: https://git.openjdk.org/jdk/pull/12875/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12875&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303630 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12875.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12875/head:pull/12875 PR: https://git.openjdk.org/jdk/pull/12875 From dholmes at openjdk.org Mon Mar 6 07:09:12 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 6 Mar 2023 07:09:12 GMT Subject: RFR: 8303630: Move nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java back to general problem list In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Mon, 6 Mar 2023 04:27:59 GMT, Chris Plummer wrote: > nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java was just moved to the -Xcomp problem list, but it looks like it can happen without -Xcomp, although it looks like JFR is required in this case. In any case, it needs to move back to the general problem list. Looks fine and trivial. Thanks for fixing. I would suggest making this JBS issue a subtask of [JDK-8277573](https://bugs.openjdk.org/browse/JDK-8277573) ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12875 From cjplummer at openjdk.org Mon Mar 6 07:19:21 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 6 Mar 2023 07:19:21 GMT Subject: RFR: 8303630: Move nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java back to general problem list In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 04:27:59 GMT, Chris Plummer wrote: > nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java was just moved to the -Xcomp problem list, but it looks like it can happen without -Xcomp, although it looks like JFR is required in this case. In any case, it needs to move back to the general problem list. Thanks David! ------------- PR: https://git.openjdk.org/jdk/pull/12875 From cjplummer at openjdk.org Mon Mar 6 07:19:22 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 6 Mar 2023 07:19:22 GMT Subject: Integrated: 8303630: Move nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java back to general problem list In-Reply-To: References: Message-ID: <_L8Vk0BLD2oM2PMYFTJU-e8h00vem0JtT8Abj4SzTpg=.7854f30d-83ce-48d3-bb3b-2b1e83a91386@github.com> On Mon, 6 Mar 2023 04:27:59 GMT, Chris Plummer wrote: > nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java was just moved to the -Xcomp problem list, but it looks like it can happen without -Xcomp, although it looks like JFR is required in this case. In any case, it needs to move back to the general problem list. This pull request has now been integrated. Changeset: 3eff1a02 Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/3eff1a022530dfaf3565844756db8736c5e80259 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod 8303630: Move nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java back to general problem list Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/12875 From tschatzl at openjdk.org Mon Mar 6 09:41:12 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 6 Mar 2023 09:41:12 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace [v2] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 12:30:38 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of merging two types. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > copyright-year Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12841 From xuelei at openjdk.org Mon Mar 6 15:19:31 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 6 Mar 2023 15:19:31 GMT Subject: RFR: 8303617: update for deprecated sprintf for jdk.jdwp.agent Message-ID: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> Hi, May I have this update reviewed? The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.jdwp.agent module. Thanks, Xuelei ------------- Commit messages: - 8303617: update for deprecated sprintf for jdk.jdwp.agent Changes: https://git.openjdk.org/jdk/pull/12870/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12870&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303617 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12870/head:pull/12870 PR: https://git.openjdk.org/jdk/pull/12870 From fparain at openjdk.org Mon Mar 6 16:47:39 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 6 Mar 2023 16:47:39 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams Message-ID: Please review this change re-implementing the FieldInfo data structure. The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of stong typing and semantic overloading, and a poor memory efficiency. The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. Tested with mach5, tier 1 to 7. Thank you. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 - Reimplementation of FieldInfo as an unsigned5 stream Changes: https://git.openjdk.org/jdk/pull/12855/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8292818 Stats: 1699 lines in 52 files changed: 897 ins; 446 del; 356 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From cjplummer at openjdk.org Mon Mar 6 18:59:12 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 6 Mar 2023 18:59:12 GMT Subject: RFR: 8303617: update for deprecated sprintf for jdk.jdwp.agent In-Reply-To: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> References: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> Message-ID: On Sat, 4 Mar 2023 06:29:20 GMT, Xue-Lei Andrew Fan wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.jdwp.agent module. > > Thanks, > Xuelei Copyright needs updating. src/jdk.jdwp.agent/share/native/libdt_socket/socketTransport.c line 224: > 222: b[received] = '\0'; > 223: /* > 224: * We should really use snprintf here but it's not available on Windows. This Windows comment should be removed. You can probably get rid of the jio_snprintf comment below as well since it adds little value, and doesn't make sense to just mention it here and not elsewhere. ------------- Changes requested by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12870 From xuelei at openjdk.org Mon Mar 6 20:08:31 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 6 Mar 2023 20:08:31 GMT Subject: RFR: 8303617: update for deprecated sprintf for jdk.jdwp.agent [v2] In-Reply-To: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> References: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> Message-ID: <5VRSsy2pfsDF4S4eMTbnp2CQrz2QaEhwiPkbdX1FS-o=.5353e7aa-7d1c-4d6a-ae38-993816570b35@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.jdwp.agent module. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: copyright year and comment update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12870/files - new: https://git.openjdk.org/jdk/pull/12870/files/10c9be47..d5150653 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12870&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12870&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12870/head:pull/12870 PR: https://git.openjdk.org/jdk/pull/12870 From aturbanov at openjdk.org Mon Mar 6 20:14:10 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 6 Mar 2023 20:14:10 GMT Subject: RFR: 8303690: Prefer ArrayList to LinkedList in com.sun.jmx.mbeanserver.Introspector Message-ID: `LinkedList` is used as value in `com.sun.jmx.mbeanserver.Introspector.SimpleIntrospector#cache` It's created, filled (with `add`) and then iterated. No removes from the head or something like this. `ArrayList` should be preferred as more efficient and widely used collection. Also I've done some related code cleaned: 1. removed redundand `if` from `SoftReference` value check 2. fixed a typo in javadoc ------------- Commit messages: - [PATCH] Prefer ArrayList to LinkedList in Introspector Changes: https://git.openjdk.org/jdk/pull/12839/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12839&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303690 Stats: 8 lines in 1 file changed: 1 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12839.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12839/head:pull/12839 PR: https://git.openjdk.org/jdk/pull/12839 From stsypanov at openjdk.org Mon Mar 6 20:14:11 2023 From: stsypanov at openjdk.org (Sergey Tsypanov) Date: Mon, 6 Mar 2023 20:14:11 GMT Subject: RFR: 8303690: Prefer ArrayList to LinkedList in com.sun.jmx.mbeanserver.Introspector In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 20:16:35 GMT, Andrey Turbanov wrote: > `LinkedList` is used as value in `com.sun.jmx.mbeanserver.Introspector.SimpleIntrospector#cache` > It's created, filled (with `add`) and then iterated. No removes from the head or something like this. `ArrayList` should be preferred as more efficient and widely used collection. > Also I've done some related code cleaned: > 1. removed redundand `if` from `SoftReference` value check > 2. fixed a typo in javadoc Marked as reviewed by stsypanov (Author). ------------- PR: https://git.openjdk.org/jdk/pull/12839 From prappo at openjdk.org Mon Mar 6 20:22:48 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Mon, 6 Mar 2023 20:22:48 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8303480 - Initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12826/files - new: https://git.openjdk.org/jdk/pull/12826/files/d2f4a553..87166408 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12826&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12826&range=00-01 Stats: 13433 lines in 415 files changed: 9003 ins; 2610 del; 1820 mod Patch: https://git.openjdk.org/jdk/pull/12826.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12826/head:pull/12826 PR: https://git.openjdk.org/jdk/pull/12826 From cjplummer at openjdk.org Mon Mar 6 20:26:22 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 6 Mar 2023 20:26:22 GMT Subject: RFR: 8303617: update for deprecated sprintf for jdk.jdwp.agent [v2] In-Reply-To: <5VRSsy2pfsDF4S4eMTbnp2CQrz2QaEhwiPkbdX1FS-o=.5353e7aa-7d1c-4d6a-ae38-993816570b35@github.com> References: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> <5VRSsy2pfsDF4S4eMTbnp2CQrz2QaEhwiPkbdX1FS-o=.5353e7aa-7d1c-4d6a-ae38-993816570b35@github.com> Message-ID: On Mon, 6 Mar 2023 20:08:31 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.jdwp.agent module. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year and comment update windows/native/libjdwp/linker_md.c also needs its copyright updated. ------------- PR: https://git.openjdk.org/jdk/pull/12870 From jjg at openjdk.org Mon Mar 6 20:31:18 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Mon, 6 Mar 2023 20:31:18 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: <-U8YFFuXm_hMf-bY1AVCRauRrE-fRYRxrx_yf38ZL1A=.d50884c5-cc4b-489a-b817-828faf876c76@github.com> Message-ID: On Fri, 3 Mar 2023 11:31:04 GMT, Alexey Ivanov wrote: >> Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. As @pavelrappo points out it's a relatively common occurrence in the OpenJDK sources, though perhaps not in the public javadocs. Perhaps a bit pretentious, but mostly a terse way to say "return true if the BSM method type exactly matches X, otherwise false". >> >> The broken link stems from the fact that the method I was targeting (a way to use condy for lambda proxy singletons rather than a `MethodHandle.constant`) was never integrated. We'll look at either getting that done (@briangoetz suggested the time might be ready for it) or remove this currently pointless static bootstrap specialization test. > >> Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. > > I've never come across it before. With your explanations, it makes perfect sense. I would recommend (separately) changing `iff` to the expanded form `if and only if` ------------- PR: https://git.openjdk.org/jdk/pull/12826 From jjg at openjdk.org Mon Mar 6 20:36:13 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Mon, 6 Mar 2023 20:36:13 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 20:22:48 GMT, Pavel Rappo wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8303480 > - Initial commit Marked as reviewed by jjg (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12826 From kevinw at openjdk.org Mon Mar 6 20:37:17 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 6 Mar 2023 20:37:17 GMT Subject: RFR: 8303690: Prefer ArrayList to LinkedList in com.sun.jmx.mbeanserver.Introspector In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 20:16:35 GMT, Andrey Turbanov wrote: > `LinkedList` is used as value in `com.sun.jmx.mbeanserver.Introspector.SimpleIntrospector#cache` > It's created, filled (with `add`) and then iterated. No removes from the head or something like this. `ArrayList` should be preferred as more efficient and widely used collection. > Also I've done some related code cleaned: > 1. removed redundand `if` from `SoftReference` value check > 2. fixed a typo in javadoc Looks good to me 8-) ------------- Marked as reviewed by kevinw (Committer). PR: https://git.openjdk.org/jdk/pull/12839 From cjplummer at openjdk.org Mon Mar 6 20:37:17 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 6 Mar 2023 20:37:17 GMT Subject: RFR: 8303690: Prefer ArrayList to LinkedList in com.sun.jmx.mbeanserver.Introspector In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 20:16:35 GMT, Andrey Turbanov wrote: > `LinkedList` is used as value in `com.sun.jmx.mbeanserver.Introspector.SimpleIntrospector#cache` > It's created, filled (with `add`) and then iterated. No removes from the head or something like this. `ArrayList` should be preferred as more efficient and widely used collection. > Also I've done some related code cleaned: > 1. removed redundand `if` from `SoftReference` value check > 2. fixed a typo in javadoc Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12839 From lancea at openjdk.org Mon Mar 6 20:39:17 2023 From: lancea at openjdk.org (Lance Andersen) Date: Mon, 6 Mar 2023 20:39:17 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 20:22:48 GMT, Pavel Rappo wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8303480 > - Initial commit Marked as reviewed by lancea (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12826 From rriggs at openjdk.org Mon Mar 6 21:29:08 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 6 Mar 2023 21:29:08 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 20:22:48 GMT, Pavel Rappo wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8303480 > - Initial commit Marked as reviewed by rriggs (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12826 From lmesnik at openjdk.org Mon Mar 6 23:55:15 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 6 Mar 2023 23:55:15 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests Message-ID: Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. ------------- Commit messages: - Added newThread method. Changes: https://git.openjdk.org/jdk/pull/12894/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12894&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303702 Stats: 51 lines in 4 files changed: 17 ins; 11 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/12894.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12894/head:pull/12894 PR: https://git.openjdk.org/jdk/pull/12894 From cjplummer at openjdk.org Tue Mar 7 01:05:14 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 7 Mar 2023 01:05:14 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 23:47:09 GMT, Leonid Mesnik wrote: > Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. Changes requested by cjplummer (Reviewer). test/jdk/com/sun/jdi/TestScaffold.java line 1037: > 1035: Object builder = Thread.class.getMethod("ofVirtual").invoke(null); > 1036: Class clazz = Class.forName("java.lang.Thread$Builder"); > 1037: java.lang.reflect.Method start = clazz.getMethod("unstarted", Runnable.class); rename start to unstarted. ------------- PR: https://git.openjdk.org/jdk/pull/12894 From cjplummer at openjdk.org Tue Mar 7 01:41:12 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 7 Mar 2023 01:41:12 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Mon, 6 Mar 2023 23:47:09 GMT, Leonid Mesnik wrote: > Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. I think all of the following could also be converted, although most will be a bit harder to accomplish since they create subclasses of the Thread class. JdbStopThreadidTest InterruptHangTest InvokeHangTest MultiBreakpointsTest SimulResumerTest TwoThreads ResumeOneThread ------------- PR: https://git.openjdk.org/jdk/pull/12894 From xuelei at openjdk.org Tue Mar 7 02:16:56 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 7 Mar 2023 02:16:56 GMT Subject: RFR: 8303617: update for deprecated sprintf for jdk.jdwp.agent [v3] In-Reply-To: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> References: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> Message-ID: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.jdwp.agent module. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: copyright year update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12870/files - new: https://git.openjdk.org/jdk/pull/12870/files/d5150653..905e2e31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12870&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12870&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12870/head:pull/12870 PR: https://git.openjdk.org/jdk/pull/12870 From xuelei at openjdk.org Tue Mar 7 02:16:57 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 7 Mar 2023 02:16:57 GMT Subject: RFR: 8303617: update for deprecated sprintf for jdk.jdwp.agent [v2] In-Reply-To: References: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> <5VRSsy2pfsDF4S4eMTbnp2CQrz2QaEhwiPkbdX1FS-o=.5353e7aa-7d1c-4d6a-ae38-993816570b35@github.com> Message-ID: On Mon, 6 Mar 2023 20:23:13 GMT, Chris Plummer wrote: > windows/native/libjdwp/linker_md.c also needs its copyright updated. Ooops, I missed this file. Updated. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12870 From lmesnik at openjdk.org Tue Mar 7 03:54:56 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 7 Mar 2023 03:54:56 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests [v2] In-Reply-To: References: Message-ID: <-6X9W5R_Bk1ZG3GZGvTDJhjswFHPjIIKiCF0mXcagt0=.88514348-7df4-4586-b768-ecce8b8d184b@github.com> > Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: more threads are virtualized. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12894/files - new: https://git.openjdk.org/jdk/pull/12894/files/60b793be..9951ab3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12894&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12894&range=00-01 Stats: 63 lines in 10 files changed: 3 ins; 19 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/12894.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12894/head:pull/12894 PR: https://git.openjdk.org/jdk/pull/12894 From cjplummer at openjdk.org Tue Mar 7 05:35:04 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 7 Mar 2023 05:35:04 GMT Subject: RFR: 8303617: update for deprecated sprintf for jdk.jdwp.agent [v3] In-Reply-To: References: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> Message-ID: <8KoM_IgHA_02_U4V62R_WNrdVcU98NYTHtkvPS1Y0cA=.703c86fa-5a2f-4301-b0db-ec9d1b89d5dc@github.com> On Tue, 7 Mar 2023 02:16:56 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.jdwp.agent module. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12870 From cjplummer at openjdk.org Tue Mar 7 05:44:07 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 7 Mar 2023 05:44:07 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests [v2] In-Reply-To: <-6X9W5R_Bk1ZG3GZGvTDJhjswFHPjIIKiCF0mXcagt0=.88514348-7df4-4586-b768-ecce8b8d184b@github.com> References: <-6X9W5R_Bk1ZG3GZGvTDJhjswFHPjIIKiCF0mXcagt0=.88514348-7df4-4586-b768-ecce8b8d184b@github.com> Message-ID: On Tue, 7 Mar 2023 03:54:56 GMT, Leonid Mesnik wrote: >> Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > more threads are virtualized. Looks good. Thanks for getting these test updated. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12894 From dholmes at openjdk.org Tue Mar 7 08:15:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 08:15:20 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 14:50:34 GMT, Frederic Parain wrote: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. HI Fred, I've taken one pass through this but it is a huge set of changes to try and digest. At this stage just a bunch of style nits. Thanks. src/hotspot/share/classfile/classFileParser.cpp line 1632: > 1630: { > 1631: debug_only(NoSafepointVerifier nsv;) > 1632: for(int i = 0; i < _temp_field_info->length(); i++) { Nit: space after 'for' please src/hotspot/share/classfile/classFileParser.cpp line 2012: > 2010: void ClassFileParser::FieldAnnotationCollector::apply_to(FieldInfo* f) { > 2011: if (is_contended()) > 2012: // Setting the contended group also set the contended bit in field flags Nit: s/set/sets/ src/hotspot/share/oops/fieldInfo.cpp line 30: > 28: > 29: void FieldInfo::print(outputStream* os, ConstantPool* cp) { > 30: os->print_cr("index=%d name_index=%d name=%s signature_index=%d signature=%s offset=%d AccessFlags=%d FieldFlags=%d initval_index=%d gen_signature_index=%d, gen_signature=%s contended_group=%d", Nit: please break up this line src/hotspot/share/oops/fieldInfo.cpp line 120: > 118: *java_fields_count = r.next_uint(); > 119: *injected_fields_count = r.next_uint(); > 120: while(r.has_next()) { Nit: space before ( src/hotspot/share/oops/fieldInfo.cpp line 135: > 133: int java_field_count = r.next_uint(); > 134: int injected_fields_count = r.next_uint(); > 135: while(r.has_next()) { Nit: space before ( src/hotspot/share/oops/fieldInfo.cpp line 140: > 138: fi.print(os, cp); > 139: } > 140: } Newline needed at EOF src/hotspot/share/oops/fieldInfo.inline.hpp line 135: > 133: new_flags = old_flags | mask; > 134: witness = Atomic::cmpxchg(&flags, old_flags, new_flags); > 135: } while(witness != old_flags); Nit: space before ( src/hotspot/share/oops/fieldInfo.inline.hpp line 155: > 153: inline void FieldStatus::update_access_watched(bool z) { update_flag(_fs_access_watched, z); } > 154: inline void FieldStatus::update_modification_watched(bool z) { update_flag(_fs_modification_watched, z); } > 155: inline void FieldStatus::update_initialized_final_update(bool z) {update_flag(_initialized_final_update, z); } Nit: space after { src/hotspot/share/oops/fieldStreams.hpp line 43: > 41: protected: > 42: const Array* _fieldinfo_stream; > 43: FieldInfoReader _reader; Nit variable name alignment seems off ------------- PR: https://git.openjdk.org/jdk/pull/12855 From ayang at openjdk.org Tue Mar 7 08:16:36 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 7 Mar 2023 08:16:36 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace [v2] In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Fri, 3 Mar 2023 12:30:38 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of merging two types. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > copyright-year Thanks for the review. ------------- PR: https://git.openjdk.org/jdk/pull/12841 From ayang at openjdk.org Tue Mar 7 08:16:39 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 7 Mar 2023 08:16:39 GMT Subject: Integrated: 8303534: Merge CompactibleSpace into ContiguousSpace In-Reply-To: References: Message-ID: <6U7ZF7mVsSZrtPD76uzRubWX5vFJTzeteT9nCjOu-Ko=.092d53b2-6aad-4930-b4a2-ffce14080ef6@github.com> On Thu, 2 Mar 2023 21:49:39 GMT, Albert Mingkun Yang wrote: > Simple refactoring of merging two types. > > Test: tier1-5 This pull request has now been integrated. Changeset: 7fbfc884 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/7fbfc884f0980169e8c08167d59147222728b66b Stats: 197 lines in 14 files changed: 27 ins; 122 del; 48 mod 8303534: Merge CompactibleSpace into ContiguousSpace Reviewed-by: cjplummer, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/12841 From ihse at openjdk.org Tue Mar 7 11:19:08 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 7 Mar 2023 11:19:08 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 20:22:48 GMT, Pavel Rappo wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8303480 > - Initial commit Marked as reviewed by ihse (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12826 From coleenp at openjdk.org Tue Mar 7 14:14:39 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 7 Mar 2023 14:14:39 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Mon, 27 Feb 2023 21:37:34 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV This looks really good. I noted a few changes and questions. src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 53: > 51: > 52: #undef __ > 53: #define __ Disassembler::hook(__FILE__, __LINE__, _masm)-> What is this? Is this something useful for debugging the template interpreter? Probably doesn't belong with this change but might be nice to have (?) @reinrich src/hotspot/cpu/x86/templateTable_x86.cpp line 2801: > 2799: bool is_invokevirtual, > 2800: bool is_invokevfinal, /*unused*/ > 2801: bool is_invokedynamic /*unused*/) { I assume you have to keep this parameter for the platform that doesn't still have this change (s390)? src/hotspot/share/cds/classListParser.cpp line 590: > 588: // resolve it > 589: Handle recv; > 590: LinkResolver::resolve_invoke(info, recv, pool, ConstantPool::encode_invokedynamic_index(indy_index), Bytecodes::_invokedynamic, CHECK); nit: can you reformat so the line isn't so long? src/hotspot/share/ci/ciReplay.cpp line 419: > 417: be used to avoid multiple blocks of similar code. When CPCE is obsoleted > 418: these can be removed > 419: */ I don't know if you really need this comment. If so, use // style instead. src/hotspot/share/ci/ciReplay.cpp line 453: > 451: if (!parse_terminator()) { > 452: report_error("no dynamic invoke found"); > 453: return NULL; nullptr not NULL. src/hotspot/share/interpreter/rewriter.hpp line 143: > 141: assert(ref_index >= _resolved_reference_limit, ""); > 142: if (_pool->tag_at(cp_index).value() != JVM_CONSTANT_InvokeDynamic) { > 143: _invokedynamic_references_map.at_put_grow(ref_index, cache_index, -1); I think you might need to rename _invokedynamic_references_map variable name to _invokehandle_references_map with this change also. This will be confusng. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 639: > 637: int indy_index = -1; > 638: for (int i = 0; i < cp->resolved_indy_entries_length(); i++) { > 639: tty->print_cr("Index: %d", cp->resolved_indy_entry_at(i)->constant_pool_index()); Looks like some debugging left in. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 1529: > 1527: if (cp_cache_entry->is_resolved(Bytecodes::_invokedynamic)) { > 1528: return Bytecodes::_invokedynamic; > 1529: } This seems like it should be removed? src/hotspot/share/oops/cpCache.cpp line 727: > 725: set_reference_map(nullptr); > 726: #if INCLUDE_CDS > 727: if (_initial_entries != nullptr) { @iklam with moving invokedynamic entries out, do you still need to save initialized entries ? Does invokehandle need this? (Should have separate RFE if more cleanup is possible) src/hotspot/share/oops/resolvedIndyEntry.hpp line 26: > 24: > 25: #ifndef SHARE_OOPS_ResolvedIndyEntry_HPP > 26: #define SHARE_OOPS_ResolvedIndyEntry_HPP Make this all capital letters src/hotspot/share/oops/resolvedIndyEntry.hpp line 71: > 69: > 70: // Bit shift to get flags > 71: // Note: Only one flag exists at the moment but more could be added Actually two flags - resolution_failed too. src/hotspot/share/oops/resolvedIndyEntry.hpp line 87: > 85: bool is_vfinal() const { return false; } > 86: bool is_final() const { return false; } > 87: bool has_local_signature() const { return true; } The closed } don't need to be aligned. src/hotspot/share/oops/resolvedIndyEntry.hpp line 111: > 109: _return_type = return_type; > 110: set_flags(has_appendix); > 111: Atomic::release_store(&_method, m); Add a comment like // set this last. The method is read lock free from the entry and if set, indicates the rest of the resolution information is valid. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ResolvedIndyArray.java line 35: > 33: import sun.jvm.hotspot.types.WrongTypeException; > 34: import sun.jvm.hotspot.utilities.GenericArray; > 35: import sun.jvm.hotspot.utilities.Observable; Do you need all of these imports ? src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 935: > 933: /*if (isInvokedynamicIndex(cpi)) { > 934: compilerToVM().resolveInvokeDynamicInPool(this, cpi); > 935: }*/ Is there something to fix here? ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Tue Mar 7 15:08:12 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 7 Mar 2023 15:08:12 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:30:50 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 53: > >> 51: >> 52: #undef __ >> 53: #define __ Disassembler::hook(__FILE__, __LINE__, _masm)-> > > What is this? Is this something useful for debugging the template interpreter? Probably doesn't belong with this change but might be nice to have (?) @reinrich Yes this is really useful when debugging the template interpreter. It annotates the disassembly with the generator source code. It helped tracking down a bug in the ppc part oft this pr. Other platforms have it too. Example: invokedynamic 186 invokedynamic [0x00003fff80075a00, 0x00003fff80075dc8] 968 bytes -------------------------------------------------------------------------------- 0x00003fff80075a00: std r17,0(r15) ;;@FILE: src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); 0x00003fff80075a04: addi r15,r15,-8 0x00003fff80075a08: b 0x00003fff80075a40 ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); 0x00003fff80075a0c: stfs f15,0(r15) ;; 2186: fep = __ pc(); __ push_f(); __ b(L); 0x00003fff80075a10: addi r15,r15,-8 0x00003fff80075a14: b 0x00003fff80075a40 ;; 2186: fep = __ pc(); __ push_f(); __ b(L); 0x00003fff80075a18: stfd f15,-8(r15) ;; 2187: dep = __ pc(); __ push_d(); __ b(L); 0x00003fff80075a1c: addi r15,r15,-16 0x00003fff80075a20: b 0x00003fff80075a40 ;; 2187: dep = __ pc(); __ push_d(); __ b(L); 0x00003fff80075a24: li r0,0 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); 0x00003fff80075a28: std r0,0(r15) 0x00003fff80075a2c: std r17,-8(r15) 0x00003fff80075a30: addi r15,r15,-16 0x00003fff80075a34: b 0x00003fff80075a40 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); 0x00003fff80075a38: stw r17,0(r15) ;; 2189: __ align(32, 12, 24); // align L ;; 2191: iep = __ pc(); __ push_i(); 0x00003fff80075a3c: addi r15,r15,-8 0x00003fff80075a40: li r21,1 ;; 2192: vep = __ pc(); ;; 2193: __ bind(L); ;;@FILE: src/hotspot/share/interpreter/templateInterpreterGenerator.cpp ;; 366: __ verify_FPU(1, t->tos_in()); ;;@FILE: src/hotspot/cpu/ppc/templateTable_ppc_64.cpp ;; 2293: __ load_resolved_indy_entry(cache, index); 0x00003fff80075a44: lwax r21,r14,r21 0x00003fff80075a48: nand r21,r21,r21 0x00003fff80075a4c: ld r31,40(r27) 0x00003fff80075a50: rldicr r21,r21,4,59 0x00003fff80075a54: addi r21,r21,8 0x00003fff80075a58: add r31,r31,r21 0x00003fff80075a5c: ld r22,0(r31) ;; 2294: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); 0x00003fff80075a60: cmpdi r22,0 ;; 2297: __ cmpdi(CCR0, method, 0); 0x00003fff80075a64: bne- 0x00003fff80075b94 ;; 2298: __ bne(CCR0, resolved);,bo=0b00100[no_hint] 0x00003fff80075a68: li r4,186 ;; 2304: __ li(R4_ARG2, code); 0x00003fff80075a6c: ld r11,0(r1) ;; 2305: __ call_VM(noreg, entry, R4_ARG2, true); ------------- PR: https://git.openjdk.org/jdk/pull/12778 From prappo at openjdk.org Tue Mar 7 15:35:51 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Tue, 7 Mar 2023 15:35:51 GMT Subject: Integrated: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. This pull request has now been integrated. Changeset: 45a616a8 Author: Pavel Rappo URL: https://git.openjdk.org/jdk/commit/45a616a891e4a4b0e77b1f2fa040522f4a99d172 Stats: 75 lines in 39 files changed: 0 ins; 0 del; 75 mod 8303480: Miscellaneous fixes to mostly invisible doc comments Reviewed-by: mullan, prr, cjplummer, aivanov, jjg, lancea, rriggs, ihse ------------- PR: https://git.openjdk.org/jdk/pull/12826 From prappo at openjdk.org Tue Mar 7 15:35:51 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Tue, 7 Mar 2023 15:35:51 GMT Subject: Integrated: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. This pull request has now been integrated. Changeset: 45a616a8 Author: Pavel Rappo URL: https://git.openjdk.org/jdk/commit/45a616a891e4a4b0e77b1f2fa040522f4a99d172 Stats: 75 lines in 39 files changed: 0 ins; 0 del; 75 mod 8303480: Miscellaneous fixes to mostly invisible doc comments Reviewed-by: mullan, prr, cjplummer, aivanov, jjg, lancea, rriggs, ihse ------------- PR: https://git.openjdk.org/jdk/pull/12826 From matsaave at openjdk.org Tue Mar 7 18:46:43 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 Mar 2023 18:46:43 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: <6-fIr9UsLHUJXqqwQNVvQrL-Q6MP_UoZAL0W7ZLDHb8=.da0c0246-cd98-4309-9247-b792212f6021@github.com> On Tue, 7 Mar 2023 14:10:33 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 935: > >> 933: /*if (isInvokedynamicIndex(cpi)) { >> 934: compilerToVM().resolveInvokeDynamicInPool(this, cpi); >> 935: }*/ > > Is there something to fix here? That's a vestige of old code that I don't believe is necessary anymore. Invokedynamic is resolved further up so that can be removed. I think it makes sense to leave the invokedynamic case for completeness, but it will be left blank. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From lmesnik at openjdk.org Tue Mar 7 19:00:25 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 7 Mar 2023 19:00:25 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests [v3] In-Reply-To: References: Message-ID: <1jQkL_QNYGBgL1OjiodwrCCu_dd0s5fVbXMDjjMxMdY=.1575c59b-1d8d-4007-9953-af2ff33f7293@github.com> > Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fixed JdbStopThreadidTest to support new change. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12894/files - new: https://git.openjdk.org/jdk/pull/12894/files/9951ab3e..0ceb7bc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12894&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12894&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12894.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12894/head:pull/12894 PR: https://git.openjdk.org/jdk/pull/12894 From dcubed at openjdk.org Tue Mar 7 19:00:29 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 7 Mar 2023 19:00:29 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests [v2] In-Reply-To: <-6X9W5R_Bk1ZG3GZGvTDJhjswFHPjIIKiCF0mXcagt0=.88514348-7df4-4586-b768-ecce8b8d184b@github.com> References: <-6X9W5R_Bk1ZG3GZGvTDJhjswFHPjIIKiCF0mXcagt0=.88514348-7df4-4586-b768-ecce8b8d184b@github.com> Message-ID: <7J-HZX8V8FnxUsaL9JNjybv8Aj-c9CGQE2xjU36Sing=.0174b74e-c32c-44b7-9291-e66ccd23e5fc@github.com> On Tue, 7 Mar 2023 03:54:56 GMT, Leonid Mesnik wrote: >> Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > more threads are virtualized. What kind of testing has been done on these changes? ------------- PR: https://git.openjdk.org/jdk/pull/12894 From lmesnik at openjdk.org Tue Mar 7 19:01:36 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 7 Mar 2023 19:01:36 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests [v2] In-Reply-To: <-6X9W5R_Bk1ZG3GZGvTDJhjswFHPjIIKiCF0mXcagt0=.88514348-7df4-4586-b768-ecce8b8d184b@github.com> References: <-6X9W5R_Bk1ZG3GZGvTDJhjswFHPjIIKiCF0mXcagt0=.88514348-7df4-4586-b768-ecce8b8d184b@github.com> Message-ID: <5qak7SGd68mRxSZ_XsvG51wQdmKiNSnxsY-uPfcrfug=.167f041f-b9ef-4cff-b9b3-d63aa4776ff2@github.com> On Tue, 7 Mar 2023 03:54:56 GMT, Leonid Mesnik wrote: >> Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > more threads are virtualized. I am did the local execution of :jdk_jdi, the execution of :jdk_jdi with wrapper. (Last time I missed to remove extra problem list and missed one issue.) Also, I've submitted tier1-5 testing in mach5. (To cover different GC/Comp options and platforms). It still in progress. ------------- PR: https://git.openjdk.org/jdk/pull/12894 From dcubed at openjdk.org Tue Mar 7 19:13:09 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 7 Mar 2023 19:13:09 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests [v3] In-Reply-To: <1jQkL_QNYGBgL1OjiodwrCCu_dd0s5fVbXMDjjMxMdY=.1575c59b-1d8d-4007-9953-af2ff33f7293@github.com> References: <1jQkL_QNYGBgL1OjiodwrCCu_dd0s5fVbXMDjjMxMdY=.1575c59b-1d8d-4007-9953-af2ff33f7293@github.com> Message-ID: On Tue, 7 Mar 2023 19:00:25 GMT, Leonid Mesnik wrote: >> Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed JdbStopThreadidTest to support new change. Thanks for documenting the testing in the PR! ------------- PR: https://git.openjdk.org/jdk/pull/12894 From matsaave at openjdk.org Tue Mar 7 19:16:35 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 Mar 2023 19:16:35 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:35:18 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/hotspot/cpu/x86/templateTable_x86.cpp line 2801: > >> 2799: bool is_invokevirtual, >> 2800: bool is_invokevfinal, /*unused*/ >> 2801: bool is_invokedynamic /*unused*/) { > > I assume you have to keep this parameter for the platform that doesn't still have this change (s390)? That's correct, this method is declared inside the hpp used by all platforms, so the parameters can't be changed until all the ports are complete. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Tue Mar 7 19:21:23 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 Mar 2023 19:21:23 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:39:50 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/hotspot/share/ci/ciReplay.cpp line 419: > >> 417: be used to avoid multiple blocks of similar code. When CPCE is obsoleted >> 418: these can be removed >> 419: */ > > I don't know if you really need this comment. If so, use // style instead. I think it's worth keeping around as a reminder for cleanup down the line since it's easy to overlook. I will change it to // style. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Tue Mar 7 19:29:01 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 Mar 2023 19:29:01 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 14:00:19 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/hotspot/share/oops/cpCache.cpp line 727: > >> 725: set_reference_map(nullptr); >> 726: #if INCLUDE_CDS >> 727: if (_initial_entries != nullptr) { > > @iklam with moving invokedynamic entries out, do you still need to save initialized entries ? Does invokehandle need this? (Should have separate RFE if more cleanup is possible) This along with the previous comment about `_invokedynamic_references_map` would probably be better suited for their own RFE. I think the scope of this PR should be limited to the indy structure and its implementation, so any changes related to invokehandle can be traced more easily. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rkennke at openjdk.org Tue Mar 7 20:34:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 7 Mar 2023 20:34:57 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v13] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix interpreter asymmetric fast-locking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/ed611b0b..9d4ca05f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=11-12 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From pchilanomate at openjdk.org Tue Mar 7 22:31:11 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 7 Mar 2023 22:31:11 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV Message-ID: Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash. As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have been already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues. The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired. I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/12911/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12911&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302779 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12911.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12911/head:pull/12911 PR: https://git.openjdk.org/jdk/pull/12911 From dholmes at openjdk.org Tue Mar 7 22:58:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 22:58:19 GMT Subject: Integrated: 8303151: DCmd framework cleanups In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Fri, 3 Mar 2023 04:59:44 GMT, David Holmes wrote: > Whilst working on the DCmd code I noticed two items that could be cleaned up: > > 1. The `NMTDCmd` is registered after the call to `register_dcmds()` instead of inside it. > > 2. The "extension" mechanism to define external DCmds (as added by [JDK-7132515](https://bugs.openjdk.org/browse/JDK-7132515) for `UnlockCommercialFeatures`) is no longer needed. > > Testing: tiers 1-3 > > Thanks This pull request has now been integrated. Changeset: 5f1108f8 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/5f1108f8f0768837591b06d47dec857963ed1fcb Stats: 32 lines in 3 files changed: 6 ins; 23 del; 3 mod 8303151: DCmd framework cleanups Reviewed-by: jsjolen, stuefe, yyang ------------- PR: https://git.openjdk.org/jdk/pull/12847 From sspitsyn at openjdk.org Wed Mar 8 00:42:13 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Mar 2023 00:42:13 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests [v3] In-Reply-To: <1jQkL_QNYGBgL1OjiodwrCCu_dd0s5fVbXMDjjMxMdY=.1575c59b-1d8d-4007-9953-af2ff33f7293@github.com> References: <1jQkL_QNYGBgL1OjiodwrCCu_dd0s5fVbXMDjjMxMdY=.1575c59b-1d8d-4007-9953-af2ff33f7293@github.com> Message-ID: <1OOpAXnGfMH7hNOBh_27_RXV1t8M2rQ8OcP8DqcXbRk=.3bfcce6b-3e50-40b6-a10c-4ae9a9700dd4@github.com> On Tue, 7 Mar 2023 19:00:25 GMT, Leonid Mesnik wrote: >> Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed JdbStopThreadidTest to support new change. The fix as it is looks good. In general, there are more `jdk_jdi` tests that could support virtual threads. Is there any plan about it? Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12894 From sspitsyn at openjdk.org Wed Mar 8 04:14:16 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Mar 2023 04:14:16 GMT Subject: RFR: 8303489: Add a test to verify classes in vmStruct have unique vtables [v3] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 02:28:32 GMT, Alex Menkov wrote: >> Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. >> The fix adds test to verify this requirement. >> >> The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Chris's feedback Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12820 From sspitsyn at openjdk.org Wed Mar 8 04:18:12 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Mar 2023 04:18:12 GMT Subject: RFR: 8303690: Prefer ArrayList to LinkedList in com.sun.jmx.mbeanserver.Introspector In-Reply-To: References: Message-ID: <3NF8hPfHpmdfCK_RXtyLSN31JyZdUL-wCghBrqzBShQ=.40bf929f-cf63-4f94-9a5e-67bae48b8556@github.com> On Thu, 2 Mar 2023 20:16:35 GMT, Andrey Turbanov wrote: > `LinkedList` is used as value in `com.sun.jmx.mbeanserver.Introspector.SimpleIntrospector#cache` > It's created, filled (with `add`) and then iterated. No removes from the head or something like this. `ArrayList` should be preferred as more efficient and widely used collection. > Also I've done some related code cleaned: > 1. removed redundand `if` from `SoftReference` value check > 2. fixed a typo in javadoc Good fix. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12839 From sspitsyn at openjdk.org Wed Mar 8 04:32:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Mar 2023 04:32:11 GMT Subject: RFR: 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 18:16:25 GMT, Chris Plummer wrote: > The test failure is caused by the arrival of unexpected ThreadStartEvents, which mess up the debugger side. The events are for threads we normally only see getting created when using virtual threads, such as carrier threads and the VirtualThread-unparker thread. Theoretically this issue could happen without virtual threads due to other VM threads starting up such as Common-Cleaner, but we haven't seen it fail for that reason yet. > > The test is testing proper thread suspension for ThreadStartEvent using each of the 3 suspension policy types. The debuggee creates a sequence of 3 debuggee threads, each one's timing coordinated with some complicated synchronization with the debugger using breakpoints and the setting of fields in the debuggee (and careful placement of suspend/resume in the debugger). The ThreadStartRequests that the debugger sets up always use a "count filter" of 1, which means the requests are good for delivering exactly 1 ThreadStartEvent, and any that come after the first will get filtered out. So when an an unexpected ThreadStartEvent arrives for something like a carrier thread, this prematurely moves the debugger on to the next step, and the synchronization with the debuggee gets messed up. > > The first step in fixing this test was to remove the count filter, so the request can handle any number of ThreadStartEvents. > > The next step was then fixing the test library code in EventHandler.java so it would filter out any undesired ThreadStartEvents, so the test just ends up getting one event, and always for the thread it is expecting. There are a few parts to this. One is improving EventFilters.filter() to filter out more threads that tend to be created during VM startup, including carrier threads and the VirtualThread-unparker thread. > > It was necessary to add some calls EventFilters.filter() from EventHandler. This was done by adding a ThreadStartEvent listener for the "spurious" thread starts (those the test debuggee does not create). This listener is added by waitForRequestedEventCommon(), which is indirectly called by the test when is calls waitForRequestedEventSet(). > > There is a also 2nd place where the ThreadStartEvent listener for "spurious" threads is needed. It is now also installed with the default listeners that are always in place. It is needed when the test is not actually waiting for a ThreadStartEvent, but is waiting for a BreakpointEvent. waitForRequestedEventCommon() is not used in this case (so none of its listeners are installed), but the default listeners are always in place and can be used to filter these ThreadStartEvents. Note this filter will also be in place when calling waitForRequestedEventCommon(), but we can't realy on it when waitForRequestedEventCommon() is used for ThreadStartEvents because the spurious ThreadStartEvent will be seen and returned before we ever get to the default filter. So we actually end up with this ThreadStartEvent listener installed twice during waitForRequestedEventCommon(). > > I did a bit of cleanup on the test, mostly renaming of threads and ThreadStartRequests so they are easier to match up with the iteration # we use in both the debuggee and debugger (0, 1, and 2). The only real change in the test itself is removing the filter count, and verifying that the ThreadStartEvent is for the expected thread. Marked as reviewed by sspitsyn (Reviewer). Looks good. Thanks, Serguei test/hotspot/jtreg/vmTestbase/nsk/share/jdi/EventHandler.java line 270: > 268: if (event instanceof ThreadStartEvent) { > 269: if (EventFilters.filtered(event)) { > 270: display(owner +": Ignoring spurious thread creation: " + event); Nit: Space is needed after first '+' sign. ------------- PR: https://git.openjdk.org/jdk/pull/12861Marked as reviewed by sspitsyn (Reviewer). From sspitsyn at openjdk.org Wed Mar 8 04:41:14 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Mar 2023 04:41:14 GMT Subject: RFR: 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed with "isCollectionUsageThresholdExceeded() returned true, while threshold = 1 and used = 0" In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 09:20:23 GMT, Kevin Walls wrote: > Test update for an occasional failure, which does not reproduce. > > The test failure in JDK-8303136 is at line 141 in the updated file here. It's the failure where isExceeded is true, but our sampled "used" value is not above the threshold. But while the comment says it's refreshing values, it does not not refresh "used", so there could have been gc/promotion activity which hits the threshold outside of the test's control. Refreshing "used" is the addition here. > > Separately, the code at line 123 in the new file also claims to refresh the values, but it only refreshes the threshold, which we aren't changing. Not making it refresh "used" at that point looks correct, so remove the "if (used >= threshold)" as we have already checked that at line 116. Looks good to me. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12823 From sspitsyn at openjdk.org Wed Mar 8 05:15:05 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Mar 2023 05:15:05 GMT Subject: RFR: 8303617: update for deprecated sprintf for jdk.jdwp.agent [v3] In-Reply-To: References: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> Message-ID: On Tue, 7 Mar 2023 02:16:56 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.jdwp.agent module. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12870 From aturbanov at openjdk.org Wed Mar 8 07:23:22 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 8 Mar 2023 07:23:22 GMT Subject: Integrated: 8303690: Prefer ArrayList to LinkedList in com.sun.jmx.mbeanserver.Introspector In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Thu, 2 Mar 2023 20:16:35 GMT, Andrey Turbanov wrote: > `LinkedList` is used as value in `com.sun.jmx.mbeanserver.Introspector.SimpleIntrospector#cache` > It's created, filled (with `add`) and then iterated. No removes from the head or something like this. `ArrayList` should be preferred as more efficient and widely used collection. > Also I've done some related code cleaned: > 1. removed redundand `if` from `SoftReference` value check > 2. fixed a typo in javadoc This pull request has now been integrated. Changeset: 1d071d08 Author: Andrey Turbanov URL: https://git.openjdk.org/jdk/commit/1d071d0817714ee2f1bd2af5f9556f7d268dd0fa Stats: 8 lines in 1 file changed: 1 ins; 3 del; 4 mod 8303690: Prefer ArrayList to LinkedList in com.sun.jmx.mbeanserver.Introspector Reviewed-by: stsypanov, kevinw, cjplummer, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12839 From kevinw at openjdk.org Wed Mar 8 08:20:16 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 Mar 2023 08:20:16 GMT Subject: RFR: 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed with "isCollectionUsageThresholdExceeded() returned true, while threshold = 1 and used = 0" In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 09:20:23 GMT, Kevin Walls wrote: > Test update for an occasional failure, which does not reproduce. > > The test failure in JDK-8303136 is at line 141 in the updated file here. It's the failure where isExceeded is true, but our sampled "used" value is not above the threshold. But while the comment says it's refreshing values, it does not not refresh "used", so there could have been gc/promotion activity which hits the threshold outside of the test's control. Refreshing "used" is the addition here. > > Separately, the code at line 123 in the new file also claims to refresh the values, but it only refreshes the threshold, which we aren't changing. Not making it refresh "used" at that point looks correct, so remove the "if (used >= threshold)" as we have already checked that at line 116. Thanks Chris, thanks Serguei! ------------- PR: https://git.openjdk.org/jdk/pull/12823 From kevinw at openjdk.org Wed Mar 8 08:24:21 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 Mar 2023 08:24:21 GMT Subject: Integrated: 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed with "isCollectionUsageThresholdExceeded() returned true, while threshold = 1 and used = 0" In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 09:20:23 GMT, Kevin Walls wrote: > Test update for an occasional failure, which does not reproduce. > > The test failure in JDK-8303136 is at line 141 in the updated file here. It's the failure where isExceeded is true, but our sampled "used" value is not above the threshold. But while the comment says it's refreshing values, it does not not refresh "used", so there could have been gc/promotion activity which hits the threshold outside of the test's control. Refreshing "used" is the addition here. > > Separately, the code at line 123 in the new file also claims to refresh the values, but it only refreshes the threshold, which we aren't changing. Not making it refresh "used" at that point looks correct, so remove the "if (used >= threshold)" as we have already checked that at line 116. This pull request has now been integrated. Changeset: afda8fbf Author: Kevin Walls URL: https://git.openjdk.org/jdk/commit/afda8fbf0bcea18cbe741e9c693789ebe0c6c4c5 Stats: 12 lines in 1 file changed: 1 ins; 4 del; 7 mod 8303136: MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005 failed with "isCollectionUsageThresholdExceeded() returned true, while threshold = 1 and used = 0" Reviewed-by: cjplummer, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12823 From rkennke at openjdk.org Wed Mar 8 11:24:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 Mar 2023 11:24:14 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v14] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use realloc instead of malloc+copy when growing the lock-stack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/9d4ca05f..12c2b8c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=12-13 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From djelinski at openjdk.org Wed Mar 8 11:59:09 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 8 Mar 2023 11:59:09 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. Other changes include: - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. - `getLastErrorString` is no longer exported by libjava. Tier1-3 tests continue to pass. No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. ------------- Commit messages: - Update copyrights - Remove LastError - Change getLastErrorString to return a jstring Changes: https://git.openjdk.org/jdk/pull/12922/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12922&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303814 Stats: 138 lines in 8 files changed: 19 ins; 44 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/12922.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12922/head:pull/12922 PR: https://git.openjdk.org/jdk/pull/12922 From mgronlun at openjdk.org Wed Mar 8 12:54:13 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 12:54:13 GMT Subject: RFR: 8257967: JFR: Event for loaded agents Message-ID: Greetings, We are adding support to let JFR report on Agents. #### Design An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. A JavaAgent is an agent written in the Java programming language, using the APIs in the package [[ava.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: // Command line jdk.JavaAgent { startTime = 12:31:19.789 (2023-03-08) name = "JavaAgent.jar" options = "foo=bar" initialization = 12:31:15.574 (2023-03-08) initializationTime = 172 ms initializationMethod = "premain" } // Dynamic load jdk.JavaAgent { startTime = 12:31:31.158 (2023-03-08) name = "JavaAgent.jar" options = "bar=baz" initialization = 12:31:31.037 (2023-03-08) initializationTime = 64,1 ms initializationMethod = "agentmain" } The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". An agent can also be written in a native programming language, using either the JVM Tools Interface (JVMTI) or JVM Profiling Interface (JVMPI). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. JVMTI standard spec:ification is [here](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html) JVMPI is an older interface, not a standard and is considered superseded by JVMTI, but the support is still in the JVM for agents started on the command line: -XRunMyAgent.jar To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: jdk.NativeAgent { startTime = 12:31:40.398 (2023-03-08) name = "jdwp" options = "transport=dt_socket,server=y,address=any,onjcmd=y" path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" dynamic = false initialization = 12:31:36.142 (2023-03-08) initializationTime = 0,00184 ms } The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. #### Implementation There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping and use it to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. In order to implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. The previous lists used to maintain the agents (JVMTI) and libraries (JVMPI) is not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. Testing: jdk_jfr, tier 1 - 6 Thanks Markus ------------- Commit messages: - event_names - adjustment - 8257967 Changes: https://git.openjdk.org/jdk/pull/12923/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8257967 Stats: 1862 lines in 22 files changed: 1353 ins; 485 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From kevinw at openjdk.org Wed Mar 8 13:20:16 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 Mar 2023 13:20:16 GMT Subject: RFR: 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Fri, 3 Mar 2023 18:16:25 GMT, Chris Plummer wrote: > The test failure is caused by the arrival of unexpected ThreadStartEvents, which mess up the debugger side. The events are for threads we normally only see getting created when using virtual threads, such as carrier threads and the VirtualThread-unparker thread. Theoretically this issue could happen without virtual threads due to other VM threads starting up such as Common-Cleaner, but we haven't seen it fail for that reason yet. > > The test is testing proper thread suspension for ThreadStartEvent using each of the 3 suspension policy types. The debuggee creates a sequence of 3 debuggee threads, each one's timing coordinated with some complicated synchronization with the debugger using breakpoints and the setting of fields in the debuggee (and careful placement of suspend/resume in the debugger). The ThreadStartRequests that the debugger sets up always use a "count filter" of 1, which means the requests are good for delivering exactly 1 ThreadStartEvent, and any that come after the first will get filtered out. So when an an unexpected ThreadStartEvent arrives for something like a carrier thread, this prematurely moves the debugger on to the next step, and the synchronization with the debuggee gets messed up. > > The first step in fixing this test was to remove the count filter, so the request can handle any number of ThreadStartEvents. > > The next step was then fixing the test library code in EventHandler.java so it would filter out any undesired ThreadStartEvents, so the test just ends up getting one event, and always for the thread it is expecting. There are a few parts to this. One is improving EventFilters.filter() to filter out more threads that tend to be created during VM startup, including carrier threads and the VirtualThread-unparker thread. > > It was necessary to add some calls EventFilters.filter() from EventHandler. This was done by adding a ThreadStartEvent listener for the "spurious" thread starts (those the test debuggee does not create). This listener is added by waitForRequestedEventCommon(), which is indirectly called by the test when is calls waitForRequestedEventSet(). > > There is a also 2nd place where the ThreadStartEvent listener for "spurious" threads is needed. It is now also installed with the default listeners that are always in place. It is needed when the test is not actually waiting for a ThreadStartEvent, but is waiting for a BreakpointEvent. waitForRequestedEventCommon() is not used in this case (so none of its listeners are installed), but the default listeners are always in place and can be used to filter these ThreadStartEvents. Note this filter will also be in place when calling waitForRequestedEventCommon(), but we can't realy on it when waitForRequestedEventCommon() is used for ThreadStartEvents because the spurious ThreadStartEvent will be seen and returned before we ever get to the default filter. So we actually end up with this ThreadStartEvent listener installed twice during waitForRequestedEventCommon(). > > I did a bit of cleanup on the test, mostly renaming of threads and ThreadStartRequests so they are easier to match up with the iteration # we use in both the debuggee and debugger (0, 1, and 2). The only real change in the test itself is removing the filter count, and verifying that the ThreadStartEvent is for the expected thread. test/hotspot/jtreg/vmTestbase/nsk/jdi/EventSet/resume/resume008.java line 150: > 148: return; > 149: } > 150: } Not your typo here, but just below on line 157 now there's the "propety" spelling error which was in the failure logs. ------------- PR: https://git.openjdk.org/jdk/pull/12861 From kevinw at openjdk.org Wed Mar 8 13:25:07 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 Mar 2023 13:25:07 GMT Subject: RFR: 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 18:16:25 GMT, Chris Plummer wrote: > The test failure is caused by the arrival of unexpected ThreadStartEvents, which mess up the debugger side. The events are for threads we normally only see getting created when using virtual threads, such as carrier threads and the VirtualThread-unparker thread. Theoretically this issue could happen without virtual threads due to other VM threads starting up such as Common-Cleaner, but we haven't seen it fail for that reason yet. > > The test is testing proper thread suspension for ThreadStartEvent using each of the 3 suspension policy types. The debuggee creates a sequence of 3 debuggee threads, each one's timing coordinated with some complicated synchronization with the debugger using breakpoints and the setting of fields in the debuggee (and careful placement of suspend/resume in the debugger). The ThreadStartRequests that the debugger sets up always use a "count filter" of 1, which means the requests are good for delivering exactly 1 ThreadStartEvent, and any that come after the first will get filtered out. So when an an unexpected ThreadStartEvent arrives for something like a carrier thread, this prematurely moves the debugger on to the next step, and the synchronization with the debuggee gets messed up. > > The first step in fixing this test was to remove the count filter, so the request can handle any number of ThreadStartEvents. > > The next step was then fixing the test library code in EventHandler.java so it would filter out any undesired ThreadStartEvents, so the test just ends up getting one event, and always for the thread it is expecting. There are a few parts to this. One is improving EventFilters.filter() to filter out more threads that tend to be created during VM startup, including carrier threads and the VirtualThread-unparker thread. > > It was necessary to add some calls EventFilters.filter() from EventHandler. This was done by adding a ThreadStartEvent listener for the "spurious" thread starts (those the test debuggee does not create). This listener is added by waitForRequestedEventCommon(), which is indirectly called by the test when is calls waitForRequestedEventSet(). > > There is a also 2nd place where the ThreadStartEvent listener for "spurious" threads is needed. It is now also installed with the default listeners that are always in place. It is needed when the test is not actually waiting for a ThreadStartEvent, but is waiting for a BreakpointEvent. waitForRequestedEventCommon() is not used in this case (so none of its listeners are installed), but the default listeners are always in place and can be used to filter these ThreadStartEvents. Note this filter will also be in place when calling waitForRequestedEventCommon(), but we can't realy on it when waitForRequestedEventCommon() is used for ThreadStartEvents because the spurious ThreadStartEvent will be seen and returned before we ever get to the default filter. So we actually end up with this ThreadStartEvent listener installed twice during waitForRequestedEventCommon(). > > I did a bit of cleanup on the test, mostly renaming of threads and ThreadStartRequests so they are easier to match up with the iteration # we use in both the debuggee and debugger (0, 1, and 2). The only real change in the test itself is removing the filter count, and verifying that the ThreadStartEvent is for the expected thread. Looks good. If in future we ever find ourselves updating that filter list, it could use a parameter to let us require only THREAD_NAME_PREFIX, rather than maintain the list of what we don't want. 8-) ------------- Marked as reviewed by kevinw (Committer). PR: https://git.openjdk.org/jdk/pull/12861 From mgronlun at openjdk.org Wed Mar 8 15:16:28 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 15:16:28 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v2] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [[ava.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > initializationMethod = "premain" > } > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > initializationMethod = "agentmain" > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar > > The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language, using either the JVM Tools Interface (JVMTI) or JVM Profiling Interface (JVMPI). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > JVMTI standard spec:ification is [here](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html) > > JVMPI is an older interface, not a standard and is considered superseded by JVMTI, but the support is still in the JVM for agents started on the command line: -XRunMyAgent.jar > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). > > The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: razor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/c50cca53..ed1ea797 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=00-01 Stats: 114 lines in 3 files changed: 33 ins; 74 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From coleenp at openjdk.org Wed Mar 8 15:40:33 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 8 Mar 2023 15:40:33 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 22:14:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash. > > As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have bee n already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues. > > The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired. > > I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test. > > Thanks, > Patricio That's a good fix and a good place for it. Thank you for figuring this out. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12911 From xuelei at openjdk.org Wed Mar 8 16:11:12 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 8 Mar 2023 16:11:12 GMT Subject: Integrated: 8303617: update for deprecated sprintf for jdk.jdwp.agent In-Reply-To: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> References: <87Sa4YRlfruHP4t5UOcMLDvDi1kX4gx0QjgKMN7WA4E=.25db3242-1687-40a6-ae11-59585df1d3a9@github.com> Message-ID: On Sat, 4 Mar 2023 06:29:20 GMT, Xue-Lei Andrew Fan wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in jdk.jdwp.agent module. > > Thanks, > Xuelei This pull request has now been integrated. Changeset: d287a5e9 Author: Xue-Lei Andrew Fan URL: https://git.openjdk.org/jdk/commit/d287a5e9d8e1b87397694781772c4ddbf5e4f4a4 Stats: 11 lines in 3 files changed: 0 ins; 4 del; 7 mod 8303617: update for deprecated sprintf for jdk.jdwp.agent Reviewed-by: cjplummer, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12870 From coleenp at openjdk.org Wed Mar 8 16:37:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 8 Mar 2023 16:37:23 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 14:50:34 GMT, Frederic Parain wrote: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. I was able to do a first pass through the vm code except for jvmci. I didn't look at tests or SA in this pass. src/hotspot/share/ci/ciFlags.hpp line 47: > 45: > 46: ciFlags() { _flags = 0; _stable = false; _intialized_final_update = false; } > 47: ciFlags(AccessFlags flags, bool is_stable= false, bool is_initialized_final_update = false) { This should use constructor initializer syntax. src/hotspot/share/classfile/classFileParser.cpp line 1491: > 1489: _temp_field_info = new GrowableArray(total_fields); > 1490: > 1491: ResourceMark rm(THREAD); Is the ResourceMark ok here or should it go before allocating _temp_field_info ? src/hotspot/share/classfile/classFileParser.cpp line 1608: > 1606: fflags.update_injected(true); > 1607: AccessFlags aflags; > 1608: FieldInfo fi(aflags, (u2)(injected[n].name_index), (u2)(injected[n].signature_index), 0, fflags); I don't know why there's a cast here until I read more. If the FieldInfo name_index and signature_index fields are only u2 sized, could you pass this as an int and then in the constructor assert that the value doesn't overflow u2 instead? src/hotspot/share/classfile/classFileParser.cpp line 1634: > 1632: for(int i = 0; i < _temp_field_info->length(); i++) { > 1633: name = _temp_field_info->adr_at(i)->name(_cp); > 1634: sig = _temp_field_info->adr_at(i)->signature(_cp); This checking for duplicates looks like a good candidate for a separate function because parse_fields is so long. I'm adding this comment to remember to file an RFE to look into making this function shorter and factor out this code. src/hotspot/share/classfile/classFileParser.cpp line 6024: > 6022: int injected_fields_count = _temp_field_info->length() - _java_fields_count; > 6023: _fieldinfo_stream = FieldInfoStream::create_FieldInfoStream(_temp_field_info, _java_fields_count, injected_fields_count, loader_data(), CHECK); > 6024: _fields_status = MetadataFactory::new_array(_loader_data, _temp_field_info->length(), FieldStatus(0), CHECK); These lines seem long, could you reformat? src/hotspot/share/classfile/fieldLayoutBuilder.cpp line 554: > 552: FieldInfo ctrl = _field_info->at(0); > 553: FieldGroup* group = nullptr; > 554: FieldInfo tfi = *it; What's the 't' in tfi? Maybe a longer variable name would be helpful here. src/hotspot/share/classfile/javaClasses.cpp line 871: > 869: // a new UNSIGNED5 stream, and substitute it to the old FieldInfo stream. > 870: > 871: int java_fields; Can you put InstanceKlass* ik = InstanceKlass::cast(k); here and use that so there's only one cast? src/hotspot/share/classfile/javaClasses.cpp line 873: > 871: int java_fields; > 872: int injected_fields; > 873: GrowableArray* fields = FieldInfoStream::create_FieldInfoArray(InstanceKlass::cast(k)->fieldinfo_stream(), &java_fields, &injected_fields); This line looks too long too. src/hotspot/share/oops/fieldInfo.hpp line 31: > 29: #include "memory/metadataFactory.hpp" > 30: #include "oops/constantPool.hpp" > 31: #include "oops/symbol.hpp" Since you added an inline.hpp function can you move the functions that rely on including constantPool.hpp, symbol.hpp and metadataFactory.hpp into the inline.hpp file? src/hotspot/share/oops/fieldInfo.hpp line 180: > 178: u2 generic_signature_index() const { return _generic_signature_index; } > 179: void set_generic_signature_index(u2 index) { _generic_signature_index = index; } > 180: u2 contention_group() const { return _contention_group; } Can you align the { in these one line functions? src/hotspot/share/oops/fieldStreams.hpp line 28: > 26: #define SHARE_OOPS_FIELDSTREAMS_HPP > 27: > 28: #include "oops/instanceKlass.inline.hpp" including .inline.hpp from .hpp is against the guidelines. You should move things and include instanceKlass.inline.hpp in fieldStreams.inline.hpp instead. src/hotspot/share/oops/fieldStreams.hpp line 104: > 102: AccessFlags flags; > 103: flags.set_flags(field()->access_flags()); > 104: return flags; Did this used to do this for a reason? src/hotspot/share/oops/fieldStreams.inline.hpp line 28: > 26: #define SHARE_OOPS_FIELDSTREAMS_INLINE_HPP > 27: > 28: #include "oops/fieldInfo.inline.hpp" I don't know if you have to include oops/fieldInfo.inline.hpp but the include line for fieldStreams.hpp should be by itself and then this new include should be below with runtime/javaThread.hpp src/hotspot/share/oops/instanceKlass.hpp line 275: > 273: // Fields information is stored in an UNSIGNED5 encoded stream (see fieldInfo.hpp) > 274: Array* _fieldinfo_stream; > 275: Array* _fields_status; Can you align these two field identifiers? src/hotspot/share/prims/jvmtiRedefineClasses.cpp line 3582: > 3580: } > 3581: if (update_required) { > 3582: Array* old_stream = InstanceKlass::cast(scratch_class)->fieldinfo_stream(); scratch_class should already be an InstanceKlass, ie cast not required here or below. src/hotspot/share/runtime/reflectionUtils.hpp line 29: > 27: > 28: #include "memory/allStatic.hpp" > 29: #include "oops/instanceKlass.inline.hpp" Also here cannot include .inline.hpp in .hpp file. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From rkennke at openjdk.org Wed Mar 8 18:25:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 Mar 2023 18:25:15 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v15] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Inline initial LockStack stack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/12c2b8c3..3c9d0d82 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=13-14 Stats: 15 lines in 2 files changed: 11 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From pchilanomate at openjdk.org Wed Mar 8 18:48:17 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 8 Mar 2023 18:48:17 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: References: Message-ID: <9TIDrqZx-KDQ0tWE7BGiGYMSS5Ftb9y3B8OLcdPvrUs=.5cb94158-ab8b-47ea-96ba-a4e073a124d6@github.com> On Wed, 8 Mar 2023 15:37:12 GMT, Coleen Phillimore wrote: > That's a good fix and a good place for it. Thank you for figuring this out. > Thanks for the review Coleen! ------------- PR: https://git.openjdk.org/jdk/pull/12911 From mgronlun at openjdk.org Wed Mar 8 18:50:18 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 18:50:18 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v3] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > initializationMethod = "premain" > } > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > initializationMethod = "agentmain" > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar > > The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language, using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). > > The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/ed1ea797..26172f0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=01-02 Stats: 13 lines in 2 files changed: 5 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Wed Mar 8 18:56:55 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 18:56:55 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > initializationMethod = "premain" > } > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > initializationMethod = "agentmain" > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar > > The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language, using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). > > The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: - remove JVMPI - cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/26172f0e..355d307c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From naoto at openjdk.org Wed Mar 8 20:05:16 2023 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 8 Mar 2023 20:05:16 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 11:30:27 GMT, Daniel Jeli?ski wrote: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. Hi Daniel, I like the idea of using `jstring` directly to avoid platform string. My concern is that is it OK to make it not export from libjava? Would it cause any concern on other platforms? Also, now it is not consistent with the other sibling `getErrorString()` call. ------------- PR: https://git.openjdk.org/jdk/pull/12922 From djelinski at openjdk.org Wed Mar 8 22:03:14 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 8 Mar 2023 22:03:14 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Wed, 8 Mar 2023 11:30:27 GMT, Daniel Jeli?ski wrote: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. I think it's fine to remove `getLastErrorString` from the list of libjava exports. After my changes it's no longer used outside libjava, and it was never a part of the JDK's public interface. Regarding consistency with `getErrorString`, I'm not too concerned about that; the functions have similar names but serve different purposes. ------------- PR: https://git.openjdk.org/jdk/pull/12922 From cjplummer at openjdk.org Wed Mar 8 22:17:14 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 8 Mar 2023 22:17:14 GMT Subject: RFR: 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 13:21:48 GMT, Kevin Walls wrote: > If in future we ever find ourselves updating that filter list, it could use a parameter to let us require only THREAD_NAME_PREFIX, rather than maintain the list of what we don't want. 8-) The threads not to filter could be passed to EventHandler.waitForRequestedEventSet(), but that doesn't solve the other issue with JDIBase.breakpointForCommunication(), which doesn't know the set of threads we care about and also relies on EventFilters.filtered(event). I'm surprised we don't have more failures due to this. I think maybe most were fixed in other ways. For example, see https://github.com/openjdk/jdk/pull/5567, which fixed a similar problem. In that case the test uses JDIBase.getEventSetForThreadStartDeath(). I took a different approach since this test uses EventHandler.waitForRequestedEventSet(), and I didn't think converting to getEventSetForThreadStartDeath() was appropriate. ------------- PR: https://git.openjdk.org/jdk/pull/12861 From cjplummer at openjdk.org Wed Mar 8 22:39:55 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 8 Mar 2023 22:39:55 GMT Subject: RFR: 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" [v2] In-Reply-To: References: Message-ID: <5hE-xsWLY1E1Lg8hZY0PvTB8-Mz7tg-iKXWrGfpoSLk=.19c50e04-2d07-4019-aabb-c121c0019709@github.com> > The test failure is caused by the arrival of unexpected ThreadStartEvents, which mess up the debugger side. The events are for threads we normally only see getting created when using virtual threads, such as carrier threads and the VirtualThread-unparker thread. Theoretically this issue could happen without virtual threads due to other VM threads starting up such as Common-Cleaner, but we haven't seen it fail for that reason yet. > > The test is testing proper thread suspension for ThreadStartEvent using each of the 3 suspension policy types. The debuggee creates a sequence of 3 debuggee threads, each one's timing coordinated with some complicated synchronization with the debugger using breakpoints and the setting of fields in the debuggee (and careful placement of suspend/resume in the debugger). The ThreadStartRequests that the debugger sets up always use a "count filter" of 1, which means the requests are good for delivering exactly 1 ThreadStartEvent, and any that come after the first will get filtered out. So when an an unexpected ThreadStartEvent arrives for something like a carrier thread, this prematurely moves the debugger on to the next step, and the synchronization with the debuggee gets messed up. > > The first step in fixing this test was to remove the count filter, so the request can handle any number of ThreadStartEvents. > > The next step was then fixing the test library code in EventHandler.java so it would filter out any undesired ThreadStartEvents, so the test just ends up getting one event, and always for the thread it is expecting. There are a few parts to this. One is improving EventFilters.filter() to filter out more threads that tend to be created during VM startup, including carrier threads and the VirtualThread-unparker thread. > > It was necessary to add some calls EventFilters.filter() from EventHandler. This was done by adding a ThreadStartEvent listener for the "spurious" thread starts (those the test debuggee does not create). This listener is added by waitForRequestedEventCommon(), which is indirectly called by the test when is calls waitForRequestedEventSet(). > > There is a also 2nd place where the ThreadStartEvent listener for "spurious" threads is needed. It is now also installed with the default listeners that are always in place. It is needed when the test is not actually waiting for a ThreadStartEvent, but is waiting for a BreakpointEvent. waitForRequestedEventCommon() is not used in this case (so none of its listeners are installed), but the default listeners are always in place and can be used to filter these ThreadStartEvents. Note this filter will also be in place when calling waitForRequestedEventCommon(), but we can't realy on it when waitForRequestedEventCommon() is used for ThreadStartEvents because the spurious ThreadStartEvent will be seen and returned before we ever get to the default filter. So we actually end up with this ThreadStartEvent listener installed twice during waitForRequestedEventCommon(). > > I did a bit of cleanup on the test, mostly renaming of threads and ThreadStartRequests so they are easier to match up with the iteration # we use in both the debuggee and debugger (0, 1, and 2). The only real change in the test itself is removing the filter count, and verifying that the ThreadStartEvent is for the expected thread. Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: Fix a couple of typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12861/files - new: https://git.openjdk.org/jdk/pull/12861/files/f159416c..ea759bb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12861&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12861&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12861.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12861/head:pull/12861 PR: https://git.openjdk.org/jdk/pull/12861 From dholmes at openjdk.org Wed Mar 8 23:02:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Mar 2023 23:02:17 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 18:56:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> initializationMethod = "premain" >> } >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> initializationMethod = "agentmain" >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar >> >> The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language, using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: > > - remove JVMPI > - cleanup This seems a very large and intrusive change just to give some data about agents (sorry if that sounds flippant). IIUC you are generating events for stuff that (used to?) happens very early in the initialization process and for which you now need to load a whole heap of JFR classes - which could themselves be subject to the actions of the agent. The impact of this on the overall initialization process is very hard to gauge. src/hotspot/share/runtime/threads.cpp line 338: > 336: if (EagerXrunInit && Arguments::init_libraries_at_startup()) { > 337: create_vm_init_libraries(); > 338: } Not obvious where this went. Changes to the initialization order can be very problematic. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From cjplummer at openjdk.org Wed Mar 8 23:26:13 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 8 Mar 2023 23:26:13 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 11:30:27 GMT, Daniel Jeli?ski wrote: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. We don't have a test for the SA changes you made. The best way to test it is with clhsdb. Run the following against a JVM pid: `$ jhsdb clhsdb --pid ` Use "jstack -v" to get a native pc from a frame, and then try `disassemble` on the address. It most likely will produce an exception since it can't find hsdis, which is actually what we want to be testing in this case: hsdb> disassemble 0x00007f38b371fca0 Error: sun.jvm.hotspot.debugger.DebuggerException: hsdis-amd64.so: cannot open shared object file: No such file or directory You'll need to test separately on Windows and any unix platform. ------------- PR: https://git.openjdk.org/jdk/pull/12922 From mgronlun at openjdk.org Wed Mar 8 23:32:08 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 23:32:08 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 18:56:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> initializationMethod = "premain" >> } >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> initializationMethod = "agentmain" >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar >> >> The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language, using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: > > - remove JVMPI > - cleanup No need to load any JFR classes. No change to startup logic. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From cjplummer at openjdk.org Wed Mar 8 23:41:05 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 8 Mar 2023 23:41:05 GMT Subject: RFR: 8303609: ProblemList serviceability/sa/TestSysProps.java with ZGC Message-ID: Although it takes both ZGC and -Xcomp to cause the test to fail, we have no way to problem list for just that combination, so I'm choosing the problem list with just ZGC since it is the main cause of the failure. I've only seen this issue on windows-x64, but there are clearly failures on linux and macos in mach5, so I'm problem listing for all platforms. ------------- Commit messages: - Problemlist TestSysProps Changes: https://git.openjdk.org/jdk/pull/12935/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12935&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303609 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12935.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12935/head:pull/12935 PR: https://git.openjdk.org/jdk/pull/12935 From dcubed at openjdk.org Wed Mar 8 23:59:04 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 8 Mar 2023 23:59:04 GMT Subject: RFR: 8303609: ProblemList serviceability/sa/TestSysProps.java with ZGC In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 22:40:56 GMT, Chris Plummer wrote: > Although it takes both ZGC and -Xcomp to cause the test to fail, we have no way to problem list for just that combination, so I'm choosing the problem list with just ZGC since it is the main cause of the failure. I've only seen this issue on windows-x64, but there are clearly failures on linux and macos in mach5, so I'm problem listing for all platforms. Thumbs up. This is a trivial fix. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/12935 From dholmes at openjdk.org Thu Mar 9 00:26:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Mar 2023 00:26:06 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 23:28:52 GMT, Markus Gr?nlund wrote: > No need to load any JFR classes. I thought JFR was all Java-based these days. But if no Java involved then that is good. > No change to startup logic. I flagged a change in my comment above. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From djelinski at openjdk.org Thu Mar 9 08:54:06 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 9 Mar 2023 08:54:06 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 23:23:33 GMT, Chris Plummer wrote: >> This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. >> >> Other changes include: >> - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. >> - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. >> - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. >> - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. >> - `getLastErrorString` is no longer exported by libjava. >> >> Tier1-3 tests continue to pass. >> >> No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. >> Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. > > We don't have a test for the SA changes you made. The best way to test it is with clhsdb. Run the following against a JVM pid: > > `$ jhsdb clhsdb --pid ` > > Use "jstack -v" to get a native pc from a frame, and then try `disassemble` on the address. It most likely will produce an exception since it can't find hsdis, which is actually what we want to be testing in this case: > > > hsdb> disassemble 0x00007f38b371fca0 > Error: sun.jvm.hotspot.debugger.DebuggerException: hsdis-amd64.so: cannot open shared object file: No such file or directory > > > You'll need to test separately on Windows and any unix platform. Thanks @plummercj for the instructions. Here's the results: Linux, with this change applied: hsdb> disassemble 0x00007f3484558da0 Error: sun.jvm.hotspot.debugger.DebuggerException: hsdis-amd64.so: cannot open shared object file: No such file or directory Windows, EN, with the change: hsdb> disassemble 0x00000107d4dae0c6 Error: sun.jvm.hotspot.debugger.DebuggerException: The specified module could not be found Windows, misconfigured CN, without the change: hsdb> disassemble 0x00000200d60de0b4 Error: sun.jvm.hotspot.debugger.DebuggerException: ????????? Windows, misconfigured CN, with the change: hsdb> disassemble 0x000001fab996e0b4 Error: sun.jvm.hotspot.debugger.DebuggerException: ????????? (note: I had to run `chcp 65001` in the console, otherwise the exception would still display incorrectly) ------------- PR: https://git.openjdk.org/jdk/pull/12922 From adinn at openjdk.org Thu Mar 9 09:18:08 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 9 Mar 2023 09:18:08 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Wed, 8 Mar 2023 23:28:52 GMT, Markus Gr?nlund wrote: >> Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove JVMPI >> - cleanup > > No need to load any JFR classes. No change to startup logic. @mgronlun Why mark Java agents as command-line or dynamic using `initializationMethod = "premain"/"agentMain"` and mark native agents using `dynamic = true/false`? Why not use `dynamic` for both? ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 09:24:16 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 09:24:16 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 23:28:52 GMT, Markus Gr?nlund wrote: >> Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove JVMPI >> - cleanup > > No need to load any JFR classes. No change to startup logic. > @mgronlun Why mark Java agents as command-line or dynamic using `initializationMethod = "premain"/"agentMain"` and mark native agents using `dynamic = true/false`? Why not use `dynamic` for both? Hi Andrew, that's a good question. I thought it could be derived in the JavaAgent case, because there are only two entry points, "premain" implies static and "agentmain" implies dynamic. For the native case, there is no information about the callback (I had it, but it depends on symbols), so the dynamic field is made explicit. It can also be added to the JavaAgent if that makes it clearer. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 09:32:19 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 09:32:19 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: <5bzaYlM6HXfUNJITjTSIaGgcJ_51OQf6XWr07w__wUw=.d0a9ac8b-a9a1-4122-9d2f-880de717d071@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Wed, 8 Mar 2023 22:56:31 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove JVMPI >> - cleanup > > src/hotspot/share/runtime/threads.cpp line 338: > >> 336: if (EagerXrunInit && Arguments::init_libraries_at_startup()) { >> 337: create_vm_init_libraries(); >> 338: } > > Not obvious where this went. Changes to the initialization order can be very problematic. Thanks, David. Two calls to launch XRun agents are invoked during startup, and they depend on the EagerXrunInit option. The !EagerXrunInit case is already located in create_vm(), but the EagerXrunInit was located as the first entry in initialize_java_lang_classes(), which I thought was tucked away a bit unnecessarily. I hoisted the EagerXrunInit from initialize_java_lang_classes() into to create_vm(). It's now the call just before initialize_java_lang_classes(). This made it clearer, i.e. to have both calls located directly in create_vm(). ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 09:35:17 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 09:35:17 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 00:23:39 GMT, David Holmes wrote: > > No need to load any JFR classes. > > I thought JFR was all Java-based these days. But if no Java involved then that is good. Ehh, no. Far from it. > > No change to startup logic. > > I flagged a change in my comment above. Thanks, pls see my reply. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From adinn at openjdk.org Thu Mar 9 09:39:16 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 9 Mar 2023 09:39:16 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: <6lvLldbtBOQUw1f_3lz6yJ8G33tpzQolD95cBHHvhNY=.0b6f4c9d-01b3-4bd4-9638-68b2e28a7a65@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Wed, 8 Mar 2023 18:56:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> initializationMethod = "premain" >> } >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> initializationMethod = "agentmain" >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar >> >> The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: > > - remove JVMPI > - cleanup Yes, I appreciate that `dynamic` can be derived from `initializationMethod` -- and vice versa. However, I was approaching this semantically from the opposite end. To me the primary characteristic that the user would be interested in is whether the agent was loaded dynamically or on the command line (whatever the type of agent). The corresponding fact, for a Java agent, that it is entered, respectively, via method agentMain or preMain is a derived (implementation) detail. Is there a reason to mention that detail? ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 09:47:16 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 09:47:16 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: <6lvLldbtBOQUw1f_3lz6yJ8G33tpzQolD95cBHHvhNY=.0b6f4c9d-01b3-4bd4-9638-68b2e28a7a65@github.com> References: <6lvLldbtBOQUw1f_3lz6yJ8G33tpzQolD95cBHHvhNY=.0b6f4c9d-01b3-4bd4-9638-68b2e28a7a65@github.com> Message-ID: <1vaEi8bGZ5D5woUEmxe_zYIOR138w4N9Mwcv_Hk6-Z0=.ee6c0d43-a9bc-46d6-8c7f-7a7cf71fb704@github.com> On Thu, 9 Mar 2023 09:36:28 GMT, Andrew Dinn wrote: > Yes, I appreciate that `dynamic` can be derived from `initializationMethod` -- and vice versa. However, I was approaching this semantically from the opposite end. To me the primary characteristic that the user would be interested in is whether the agent was loaded dynamically or on the command line (whatever the type of agent). The corresponding fact, for a Java agent, that it is entered, respectively, via method agentMain or preMain is a derived (implementation) detail. Is there a reason to mention that detail? That's a good point. The overall intent was to map what method was measured during initialization. That included native agent callbacks as well. It may be an unnecessary implementation detail and may restrict the possibility of growth. It is probably a better design abstraction to leave out the specific method. I dropped the callback function for the native agent, but we should also do the same for JavaAgents. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 10:57:27 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 10:57:27 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v5] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: remove implementation details ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/355d307c..f0c04055 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=03-04 Stats: 16 lines in 3 files changed: 9 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 11:33:18 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 11:33:18 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v6] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/f0c04055..80f22257 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=04-05 Stats: 5 lines in 2 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 11:48:23 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 11:48:23 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v7] In-Reply-To: References: Message-ID: <3giPMTlZQoURo1LkJ6Bq5TAxt0WsrXqj_LFopPBon7U=.daf0ce0c-e4dd-4df6-b8f2-2f4408954a64@github.com> > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/80f22257..d0609bfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=05-06 Stats: 15 lines in 3 files changed: 1 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 11:56:10 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 11:56:10 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v8] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: more cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/d0609bfb..db48fe8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=06-07 Stats: 4 lines in 3 files changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From lmesnik at openjdk.org Thu Mar 9 15:47:39 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 9 Mar 2023 15:47:39 GMT Subject: RFR: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests [v3] In-Reply-To: <1jQkL_QNYGBgL1OjiodwrCCu_dd0s5fVbXMDjjMxMdY=.1575c59b-1d8d-4007-9953-af2ff33f7293@github.com> References: <1jQkL_QNYGBgL1OjiodwrCCu_dd0s5fVbXMDjjMxMdY=.1575c59b-1d8d-4007-9953-af2ff33f7293@github.com> Message-ID: On Tue, 7 Mar 2023 19:00:25 GMT, Leonid Mesnik wrote: >> Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed JdbStopThreadidTest to support new change. I've updated all tests with additional threads which I found, ------------- PR: https://git.openjdk.org/jdk/pull/12894 From lmesnik at openjdk.org Thu Mar 9 15:47:42 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 9 Mar 2023 15:47:42 GMT Subject: Integrated: 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 23:47:09 GMT, Leonid Mesnik wrote: > Provide a way to start debugee threads as platform or virtual for debugee in com/sun/jdi tests. This pull request has now been integrated. Changeset: cdcf5c1e Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/cdcf5c1ed89505b6bf688fb255b493be4bbb13d2 Stats: 116 lines in 13 files changed: 20 ins; 30 del; 66 mod 8303702: Provide ThreadFactory to create platform/virtual threads for com/sun/jdi tests Reviewed-by: cjplummer, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12894 From coleenp at openjdk.org Thu Mar 9 15:56:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 15:56:12 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 19:25:58 GMT, Matias Saavedra Silva wrote: >> src/hotspot/share/oops/cpCache.cpp line 727: >> >>> 725: set_reference_map(nullptr); >>> 726: #if INCLUDE_CDS >>> 727: if (_initial_entries != nullptr) { >> >> @iklam with moving invokedynamic entries out, do you still need to save initialized entries ? Does invokehandle need this? (Should have separate RFE if more cleanup is possible) > > This along with the previous comment about `_invokedynamic_references_map` would probably be better suited for their own RFE. I think the scope of this PR should be limited to the indy structure and its implementation, so any changes related to invokehandle can be traced more easily. ok, that's fine. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From coleenp at openjdk.org Thu Mar 9 16:03:53 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 16:03:53 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 15:04:29 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 53: >> >>> 51: >>> 52: #undef __ >>> 53: #define __ Disassembler::hook(__FILE__, __LINE__, _masm)-> >> >> What is this? Is this something useful for debugging the template interpreter? Probably doesn't belong with this change but might be nice to have (?) @reinrich > > Yes this is really useful when debugging the template interpreter. It annotates the disassembly with the generator source code. It helped tracking down a bug in the ppc part oft this pr. Other platforms have it too. > > Example: > > invokedynamic 186 invokedynamic [0x00003fff80075a00, 0x00003fff80075dc8] 968 bytes > > -------------------------------------------------------------------------------- > 0x00003fff80075a00: std r17,0(r15) ;;@FILE: src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); > 0x00003fff80075a04: addi r15,r15,-8 > 0x00003fff80075a08: b 0x00003fff80075a40 ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); > 0x00003fff80075a0c: stfs f15,0(r15) ;; 2186: fep = __ pc(); __ push_f(); __ b(L); > 0x00003fff80075a10: addi r15,r15,-8 > 0x00003fff80075a14: b 0x00003fff80075a40 ;; 2186: fep = __ pc(); __ push_f(); __ b(L); > 0x00003fff80075a18: stfd f15,-8(r15) ;; 2187: dep = __ pc(); __ push_d(); __ b(L); > 0x00003fff80075a1c: addi r15,r15,-16 > 0x00003fff80075a20: b 0x00003fff80075a40 ;; 2187: dep = __ pc(); __ push_d(); __ b(L); > 0x00003fff80075a24: li r0,0 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); > 0x00003fff80075a28: std r0,0(r15) > 0x00003fff80075a2c: std r17,-8(r15) > 0x00003fff80075a30: addi r15,r15,-16 > 0x00003fff80075a34: b 0x00003fff80075a40 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); > 0x00003fff80075a38: stw r17,0(r15) ;; 2189: __ align(32, 12, 24); // align L > ;; 2191: iep = __ pc(); __ push_i(); > 0x00003fff80075a3c: addi r15,r15,-8 > 0x00003fff80075a40: li r21,1 ;; 2192: vep = __ pc(); > ;; 2193: __ bind(L); > ;;@FILE: src/hotspot/share/interpreter/templateInterpreterGenerator.cpp > ;; 366: __ verify_FPU(1, t->tos_in()); > ;;@FILE: src/hotspot/cpu/ppc/templateTable_ppc_64.cpp > ;; 2293: __ load_resolved_indy_entry(cache, index); > 0x00003fff80075a44: lwax r21,r14,r21 > 0x00003fff80075a48: nand r21,r21,r21 > 0x00003fff80075a4c: ld r31,40(r27) > 0x00003fff80075a50: rldicr r21,r21,4,59 > 0x00003fff80075a54: addi r21,r21,8 > 0x00003fff80075a58: add r31,r31,r21 > 0x00003fff80075a5c: ld r22,0(r31) ;; 2294: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); > 0x00003fff80075a60: cmpdi r22,0 ;; 2297: __ cmpdi(CCR0, method, 0); > 0x00003fff80075a64: bne- 0x00003fff80075b94 ;; 2298: __ bne(CCR0, resolved);,bo=0b00100[no_hint] > 0x00003fff80075a68: li r4,186 ;; 2304: __ li(R4_ARG2, code); > 0x00003fff80075a6c: ld r11,0(r1) ;; 2305: __ call_VM(noreg, entry, R4_ARG2, true); This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). Does it affect performance when generating the template interpreter? Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From coleenp at openjdk.org Thu Mar 9 16:03:54 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 16:03:54 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: <0TIZDGiEpwyU6vRDvWCTiQBoyekNHwa5QoJr4ViPh0U=.7a45b939-ce28-4936-871b-dfc4c0b3a5ec@github.com> On Thu, 9 Mar 2023 16:00:53 GMT, Coleen Phillimore wrote: >> Yes this is really useful when debugging the template interpreter. It annotates the disassembly with the generator source code. It helped tracking down a bug in the ppc part oft this pr. Other platforms have it too. >> >> Example: >> >> invokedynamic 186 invokedynamic [0x00003fff80075a00, 0x00003fff80075dc8] 968 bytes >> >> -------------------------------------------------------------------------------- >> 0x00003fff80075a00: std r17,0(r15) ;;@FILE: src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >> ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); >> 0x00003fff80075a04: addi r15,r15,-8 >> 0x00003fff80075a08: b 0x00003fff80075a40 ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); >> 0x00003fff80075a0c: stfs f15,0(r15) ;; 2186: fep = __ pc(); __ push_f(); __ b(L); >> 0x00003fff80075a10: addi r15,r15,-8 >> 0x00003fff80075a14: b 0x00003fff80075a40 ;; 2186: fep = __ pc(); __ push_f(); __ b(L); >> 0x00003fff80075a18: stfd f15,-8(r15) ;; 2187: dep = __ pc(); __ push_d(); __ b(L); >> 0x00003fff80075a1c: addi r15,r15,-16 >> 0x00003fff80075a20: b 0x00003fff80075a40 ;; 2187: dep = __ pc(); __ push_d(); __ b(L); >> 0x00003fff80075a24: li r0,0 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); >> 0x00003fff80075a28: std r0,0(r15) >> 0x00003fff80075a2c: std r17,-8(r15) >> 0x00003fff80075a30: addi r15,r15,-16 >> 0x00003fff80075a34: b 0x00003fff80075a40 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); >> 0x00003fff80075a38: stw r17,0(r15) ;; 2189: __ align(32, 12, 24); // align L >> ;; 2191: iep = __ pc(); __ push_i(); >> 0x00003fff80075a3c: addi r15,r15,-8 >> 0x00003fff80075a40: li r21,1 ;; 2192: vep = __ pc(); >> ;; 2193: __ bind(L); >> ;;@FILE: src/hotspot/share/interpreter/templateInterpreterGenerator.cpp >> ;; 366: __ verify_FPU(1, t->tos_in()); >> ;;@FILE: src/hotspot/cpu/ppc/templateTable_ppc_64.cpp >> ;; 2293: __ load_resolved_indy_entry(cache, index); >> 0x00003fff80075a44: lwax r21,r14,r21 >> 0x00003fff80075a48: nand r21,r21,r21 >> 0x00003fff80075a4c: ld r31,40(r27) >> 0x00003fff80075a50: rldicr r21,r21,4,59 >> 0x00003fff80075a54: addi r21,r21,8 >> 0x00003fff80075a58: add r31,r31,r21 >> 0x00003fff80075a5c: ld r22,0(r31) ;; 2294: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); >> 0x00003fff80075a60: cmpdi r22,0 ;; 2297: __ cmpdi(CCR0, method, 0); >> 0x00003fff80075a64: bne- 0x00003fff80075b94 ;; 2298: __ bne(CCR0, resolved);,bo=0b00100[no_hint] >> 0x00003fff80075a68: li r4,186 ;; 2304: __ li(R4_ARG2, code); >> 0x00003fff80075a6c: ld r11,0(r1) ;; 2305: __ call_VM(noreg, entry, R4_ARG2, true); > > This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). Does it affect performance when generating the template interpreter? Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. This looks cool. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Thu Mar 9 16:47:09 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 9 Mar 2023 16:47:09 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: <0TIZDGiEpwyU6vRDvWCTiQBoyekNHwa5QoJr4ViPh0U=.7a45b939-ce28-4936-871b-dfc4c0b3a5ec@github.com> References: <0TIZDGiEpwyU6vRDvWCTiQBoyekNHwa5QoJr4ViPh0U=.7a45b939-ce28-4936-871b-dfc4c0b3a5ec@github.com> Message-ID: On Thu, 9 Mar 2023 16:01:21 GMT, Coleen Phillimore wrote: >> This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). Does it affect performance when generating the template interpreter? Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. > > This looks cool. > This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). Ok. > Does it affect performance when generating the template interpreter? I didn't think it would affect performance if the interpreter is not printed. I have not measured it though. > Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. Yes you do. It is not working with the AbstractDisassembler which produces a hex dump of the machine code. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From mgronlun at openjdk.org Thu Mar 9 16:58:42 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 16:58:42 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v9] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: handle multiple envs with same VMInit callback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/db48fe8d..abeaa324 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=07-08 Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From naoto at openjdk.org Thu Mar 9 18:15:41 2023 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 9 Mar 2023 18:15:41 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 11:30:27 GMT, Daniel Jeli?ski wrote: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. Looks good (w/ some minor comments) src/java.base/share/native/libzip/zip_util.c line 767: > 765: * or NULL if an error occurred. If a zip error occurred then *pmsg will > 766: * be set to the error message text if pmsg != 0. Otherwise, *pmsg will be > 767: * set to NULL. Caller doesn't need to free the error message. I'd put some more context here why the caller does not need to free. (as it is a static text) src/java.base/windows/native/libjava/jni_util_md.c line 80: > 78: 0, > 79: buf, > 80: sizeof(buf) / sizeof(WCHAR), Maybe better to #define the size 256 so that this division is not needed. src/java.base/windows/native/libnio/ch/FileDispatcherImpl.c line 208: > 206: > 207: if (result == 0) { > 208: JNU_ThrowIOExceptionWithLastError(env, "Write failed"); Could be replaced with `JNU_ThrowIOException`? src/java.base/windows/native/libnio/ch/FileDispatcherImpl.c line 260: > 258: > 259: if (result == 0) { > 260: JNU_ThrowIOExceptionWithLastError(env, "Write failed"); Same as above src/java.base/windows/native/libnio/ch/FileDispatcherImpl.c line 299: > 297: > 298: if (result == 0) { > 299: JNU_ThrowIOExceptionWithLastError(env, "Write failed"); Same here ------------- PR: https://git.openjdk.org/jdk/pull/12922 From dcubed at openjdk.org Thu Mar 9 18:54:49 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 18:54:49 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v5] In-Reply-To: <-ImSJ7DFBeTtKn-R9IJcFE8wreHtVHYxWBv743xPa8s=.6ced1034-d7e1-4e23-a53d-81cbda44361a@github.com> References: <-ImSJ7DFBeTtKn-R9IJcFE8wreHtVHYxWBv743xPa8s=.6ced1034-d7e1-4e23-a53d-81cbda44361a@github.com> Message-ID: On Mon, 30 Jan 2023 14:30:41 GMT, Roman Kennke wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 1336: >> >>> 1334: // Success! Return inflated monitor. >>> 1335: if (own) { >>> 1336: assert(current->is_Java_thread(), "must be: checked in is_lock_owned()"); >> >> `is_lock_owned()` currently does this: >> >> >> static bool is_lock_owned(Thread* thread, oop obj) { >> assert(UseFastLocking, "only call this with fast-locking enabled"); >> return thread->is_Java_thread() ? reinterpret_cast(thread)->lock_stack().contains(obj) : false; >> } >> >> >> so I would not say "checked in is_locked_owned()" since `is_locked_owned()` does >> not enforce that the caller is a JavaThread. > > If it's not a Java thread, `is_lock_owned()` returns `false`, and we wouldn't end up in the `if (own)` branch. Okay, I get it. `is_lock_owned()` only return `true` when called by a JavaThread and if that JavaThread owns the monitor. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 9 19:00:56 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 19:00:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v15] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 18:25:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Inline initial LockStack stack This project is still currently baselined on jdk-21+10-761. I was expecting this merge: [Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2](https://github.com/openjdk/jdk/pull/10907/commits/3c9d0d822fc15a196c4b8920b89ad6d3d0547101) to sync in the latest main baseline bits, but apparently not. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From amenkov at openjdk.org Thu Mar 9 19:16:46 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 9 Mar 2023 19:16:46 GMT Subject: Integrated: 8303489: Add a test to verify classes in vmStruct have unique vtables In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 02:41:12 GMT, Alex Menkov wrote: > Unique vtables for classes in vmStruct data is a requirement for SA to correctly detect hotspot classes. > The fix adds test to verify this requirement. > > The test fails as expected on Windows if VM is built without RTTI (see JDK-8302817) This pull request has now been integrated. Changeset: f9aadb94 Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/f9aadb943cb90382a631a5cafd0624d4e8a47789 Stats: 180 lines in 2 files changed: 178 ins; 0 del; 2 mod 8303489: Add a test to verify classes in vmStruct have unique vtables Reviewed-by: cjplummer, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12820 From pchilanomate at openjdk.org Thu Mar 9 19:19:24 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 9 Mar 2023 19:19:24 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/12956/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12956&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303908 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12956.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12956/head:pull/12956 PR: https://git.openjdk.org/jdk/pull/12956 From fparain at openjdk.org Thu Mar 9 19:27:32 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Mar 2023 19:27:32 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: <89PNDhKGJttxXQg3Izv3dvSM33KewEekz4kmwVUjQXo=.d7fc365c-9783-4e5a-ac04-ba770a51c43c@github.com> On Mon, 27 Feb 2023 21:37:34 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1844: > 1842: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) > 1843: mov(tmp, sizeof(ResolvedIndyEntry)); > 1844: mul(index, index, tmp); On 64bits platform, sizeof(ResolvedIndyEntry) is 16, a power of two, so shift instruction could be used instead of a multiply instructions (with an assert in case the size of ResolvedIndyEntry is changed). src/hotspot/cpu/x86/interp_masm_x86.cpp line 2075: > 2073: movptr(cache, Address(rbp, frame::interpreter_frame_cache_offset * wordSize)); > 2074: movptr(cache, Address(cache, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); > 2075: imull(index, index, sizeof(ResolvedIndyEntry)); // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) A shift instruction could be used when sizeof(ResolvedIndyEntry) is a power of two. It is on x86_64 platforms but not on x86_32 platforms (both are using this file). Suggested change: if (is_power_of_2(sizeof(ResolvedIndyEntry))) { shll(index, log2i(sizeof(ResolvedIndyEntry))); } else { imull(index, index, sizeof(ResolvedIndyEntry)); // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) } src/hotspot/cpu/x86/templateTable_x86.cpp line 2747: > 2745: address entry = CAST_FROM_FN_PTR(address, InterpreterRuntime::resolve_from_cache); > 2746: __ movl(method, code); // this is essentially Bytecodes::_invokedynamic > 2747: __ call_VM(noreg, entry, method); // Example uses temp = rbx. In this case rbx is method The comment is confusing and seems to need an update. The register 'method' is used, but its content is not the method anymore, it is the bytecode. src/hotspot/cpu/x86/templateTable_x86.cpp line 2770: > 2768: // since the parameter_size includes it. > 2769: __ push(rbx); > 2770: __ mov(rbx, index); Why is the index (rdx) copied to rbx instead of using the index (rdx) register directly to call load_resolved_reference_at_index() ? The method doesn't modify the content of the register. src/hotspot/share/interpreter/bootstrapInfo.cpp line 67: > 65: assert(_indy_index != -1, ""); > 66: // Check if method is not null > 67: if ( _pool->resolved_indy_entry_at(_indy_index)->method() != nullptr) { _pool->resolved_reference_from_indy(_indy_index) is repeated 5 times. Using a local variable would make the code easier to read. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From coleenp at openjdk.org Thu Mar 9 20:43:09 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 20:43:09 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: <0TIZDGiEpwyU6vRDvWCTiQBoyekNHwa5QoJr4ViPh0U=.7a45b939-ce28-4936-871b-dfc4c0b3a5ec@github.com> Message-ID: On Thu, 9 Mar 2023 16:44:21 GMT, Richard Reingruber wrote: >> This looks cool. > >> This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). > > Ok. > >> Does it affect performance when generating the template interpreter? > > I didn't think it would affect performance if the interpreter is not printed. I have not measured it though. > >> Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. > > Yes you do. It is not working with the AbstractDisassembler which produces a hex dump of the machine code. I was searching in the wrong directory, which is why I didn't find this and apparently I reviewed this change in 2018. We can leave this change here. Sorry for the noise. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From fparain at openjdk.org Thu Mar 9 20:51:38 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Mar 2023 20:51:38 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 15:46:12 GMT, Coleen Phillimore wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > src/hotspot/share/classfile/classFileParser.cpp line 1491: > >> 1489: _temp_field_info = new GrowableArray(total_fields); >> 1490: >> 1491: ResourceMark rm(THREAD); > > Is the ResourceMark ok here or should it go before allocating _temp_field_info ? _temp_field_info must survive after ClassFileParser::parse_fields() has returned, so definitively after the allocation of _temp_field_info. That being said, I don't see any reason to have a ResourceMark here, probably a remain of some debugging/tracing code. I'll remove it. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Thu Mar 9 21:02:05 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Mar 2023 21:02:05 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 15:50:05 GMT, Coleen Phillimore wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > src/hotspot/share/classfile/classFileParser.cpp line 1608: > >> 1606: fflags.update_injected(true); >> 1607: AccessFlags aflags; >> 1608: FieldInfo fi(aflags, (u2)(injected[n].name_index), (u2)(injected[n].signature_index), 0, fflags); > > I don't know why there's a cast here until I read more. If the FieldInfo name_index and signature_index fields are only u2 sized, could you pass this as an int and then in the constructor assert that the value doesn't overflow u2 instead? The type of name_index and signature_index is const vmSymbolID, because they names and signatures of injected fields do not come from a constant pool, but from the vmSymbol array. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From rkennke at openjdk.org Thu Mar 9 21:08:16 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 9 Mar 2023 21:08:16 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: - Merge branch 'master' into JDK-8291555-v2 - Various small fixes and improvements - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Use realloc instead of malloc+copy when growing the lock-stack - Inline initial LockStack stack - Fix interpreter asymmetric fast-locking - Fix merge error (move done label into correct places) - Merge branch 'master' into JDK-8291555-v2 - Small fixes - Fix anon owner in fast-path, avoid runtime call (aarch64) - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 ------------- Changes: https://git.openjdk.org/jdk/pull/10907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=15 Stats: 2090 lines in 74 files changed: 1327 ins; 94 del; 669 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From fparain at openjdk.org Thu Mar 9 21:12:11 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Mar 2023 21:12:11 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: <0Ayok_tvKtFDl6deMwvAIUcT8MFkA8fIXSCHqwXKYts=.2cf55fae-34ae-4ce3-b733-b32df0acaf45@github.com> On Wed, 8 Mar 2023 16:05:57 GMT, Coleen Phillimore wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > src/hotspot/share/classfile/fieldLayoutBuilder.cpp line 554: > >> 552: FieldInfo ctrl = _field_info->at(0); >> 553: FieldGroup* group = nullptr; >> 554: FieldInfo tfi = *it; > > What's the 't' in tfi? Maybe a longer variable name would be helpful here. At some point there was a TempFieldInfo type, hence the name. Renamed to fieldinfo. > src/hotspot/share/classfile/javaClasses.cpp line 871: > >> 869: // a new UNSIGNED5 stream, and substitute it to the old FieldInfo stream. >> 870: >> 871: int java_fields; > > Can you put InstanceKlass* ik = InstanceKlass::cast(k); here and use that so there's only one cast? Sure, done. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From matsaave at openjdk.org Thu Mar 9 21:18:19 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 9 Mar 2023 21:18:19 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: References: Message-ID: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Interpreter optimization and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/829517d6..c2d87e59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=00-01 Stats: 47 lines in 10 files changed: 11 ins; 13 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From amenkov at openjdk.org Thu Mar 9 21:52:48 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 9 Mar 2023 21:52:48 GMT Subject: RFR: 8303924: ProblemList serviceability/sa/UniqueVtableTest.java on Linux Message-ID: The test fails intermittently on linux-x64-debug and linux-aarch64-debug ------------- Commit messages: - problemlist UniqueVtableTest on linux Changes: https://git.openjdk.org/jdk/pull/12962/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12962&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303924 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12962.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12962/head:pull/12962 PR: https://git.openjdk.org/jdk/pull/12962 From dcubed at openjdk.org Thu Mar 9 21:52:49 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 21:52:49 GMT Subject: RFR: 8303924: ProblemList serviceability/sa/UniqueVtableTest.java on Linux In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 21:36:32 GMT, Alex Menkov wrote: > The test fails intermittently on linux-x64-debug and linux-aarch64-debug Thumbs up. This is a trivial fix. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/12962 From cjplummer at openjdk.org Thu Mar 9 21:56:04 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 9 Mar 2023 21:56:04 GMT Subject: RFR: 8303609: ProblemList serviceability/sa/TestSysProps.java with ZGC In-Reply-To: References: Message-ID: <3Qb__fUTyNnDgYKCsOa3TZ6p9Oat82-ugQUccOLH_mU=.e8ed8db3-5b5a-44cf-9c32-c16d3875b0ac@github.com> On Wed, 8 Mar 2023 22:40:56 GMT, Chris Plummer wrote: > Although it takes both ZGC and -Xcomp to cause the test to fail, we have no way to problem list for just that combination, so I'm choosing the problem list with just ZGC since it is the main cause of the failure. I've only seen this issue on windows-x64, but there are clearly failures on linux and macos in mach5, so I'm problem listing for all platforms. Thanks Dan! ------------- PR: https://git.openjdk.org/jdk/pull/12935 From cjplummer at openjdk.org Thu Mar 9 21:58:18 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 9 Mar 2023 21:58:18 GMT Subject: RFR: 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" [v2] In-Reply-To: <5hE-xsWLY1E1Lg8hZY0PvTB8-Mz7tg-iKXWrGfpoSLk=.19c50e04-2d07-4019-aabb-c121c0019709@github.com> References: <5hE-xsWLY1E1Lg8hZY0PvTB8-Mz7tg-iKXWrGfpoSLk=.19c50e04-2d07-4019-aabb-c121c0019709@github.com> Message-ID: On Wed, 8 Mar 2023 22:39:55 GMT, Chris Plummer wrote: >> The test failure is caused by the arrival of unexpected ThreadStartEvents, which mess up the debugger side. The events are for threads we normally only see getting created when using virtual threads, such as carrier threads and the VirtualThread-unparker thread. Theoretically this issue could happen without virtual threads due to other VM threads starting up such as Common-Cleaner, but we haven't seen it fail for that reason yet. >> >> The test is testing proper thread suspension for ThreadStartEvent using each of the 3 suspension policy types. The debuggee creates a sequence of 3 debuggee threads, each one's timing coordinated with some complicated synchronization with the debugger using breakpoints and the setting of fields in the debuggee (and careful placement of suspend/resume in the debugger). The ThreadStartRequests that the debugger sets up always use a "count filter" of 1, which means the requests are good for delivering exactly 1 ThreadStartEvent, and any that come after the first will get filtered out. So when an an unexpected ThreadStartEvent arrives for something like a carrier thread, this prematurely moves the debugger on to the next step, and the synchronization with the debuggee gets messed up. >> >> The first step in fixing this test was to remove the count filter, so the request can handle any number of ThreadStartEvents. >> >> The next step was then fixing the test library code in EventHandler.java so it would filter out any undesired ThreadStartEvents, so the test just ends up getting one event, and always for the thread it is expecting. There are a few parts to this. One is improving EventFilters.filter() to filter out more threads that tend to be created during VM startup, including carrier threads and the VirtualThread-unparker thread. >> >> It was necessary to add some calls EventFilters.filter() from EventHandler. This was done by adding a ThreadStartEvent listener for the "spurious" thread starts (those the test debuggee does not create). This listener is added by waitForRequestedEventCommon(), which is indirectly called by the test when is calls waitForRequestedEventSet(). >> >> There is a also 2nd place where the ThreadStartEvent listener for "spurious" threads is needed. It is now also installed with the default listeners that are always in place. It is needed when the test is not actually waiting for a ThreadStartEvent, but is waiting for a BreakpointEvent. waitForRequestedEventCommon() is not used in this case (so none of its listeners are installed), but the default listeners are always in place and can be used to filter these ThreadStartEvents. Note this filter will also be in place when calling waitForRequestedEventCommon(), but we can't realy on it when waitForRequestedEventCommon() is used for ThreadStartEvents because the spurious ThreadStartEvent will be seen and returned before we ever get to the default filter. So we actually end up with this ThreadStartEvent listener installed twice during waitForRequestedEventCommon(). >> >> I did a bit of cleanup on the test, mostly renaming of threads and ThreadStartRequests so they are easier to match up with the iteration # we use in both the debuggee and debugger (0, 1, and 2). The only real change in the test itself is removing the filter count, and verifying that the ThreadStartEvent is for the expected thread. > > Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: > > Fix a couple of typos Thanks Kevin and Serguei! ------------- PR: https://git.openjdk.org/jdk/pull/12861 From cjplummer at openjdk.org Thu Mar 9 21:58:19 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 9 Mar 2023 21:58:19 GMT Subject: Integrated: 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" In-Reply-To: References: Message-ID: <92sh2-p32EwP56vvcRPN6bEoBDmlUzieKo_CjDTktrQ=.79ea35e4-dc97-4dd7-82ec-3b42b797cfbe@github.com> On Fri, 3 Mar 2023 18:16:25 GMT, Chris Plummer wrote: > The test failure is caused by the arrival of unexpected ThreadStartEvents, which mess up the debugger side. The events are for threads we normally only see getting created when using virtual threads, such as carrier threads and the VirtualThread-unparker thread. Theoretically this issue could happen without virtual threads due to other VM threads starting up such as Common-Cleaner, but we haven't seen it fail for that reason yet. > > The test is testing proper thread suspension for ThreadStartEvent using each of the 3 suspension policy types. The debuggee creates a sequence of 3 debuggee threads, each one's timing coordinated with some complicated synchronization with the debugger using breakpoints and the setting of fields in the debuggee (and careful placement of suspend/resume in the debugger). The ThreadStartRequests that the debugger sets up always use a "count filter" of 1, which means the requests are good for delivering exactly 1 ThreadStartEvent, and any that come after the first will get filtered out. So when an an unexpected ThreadStartEvent arrives for something like a carrier thread, this prematurely moves the debugger on to the next step, and the synchronization with the debuggee gets messed up. > > The first step in fixing this test was to remove the count filter, so the request can handle any number of ThreadStartEvents. > > The next step was then fixing the test library code in EventHandler.java so it would filter out any undesired ThreadStartEvents, so the test just ends up getting one event, and always for the thread it is expecting. There are a few parts to this. One is improving EventFilters.filter() to filter out more threads that tend to be created during VM startup, including carrier threads and the VirtualThread-unparker thread. > > It was necessary to add some calls EventFilters.filter() from EventHandler. This was done by adding a ThreadStartEvent listener for the "spurious" thread starts (those the test debuggee does not create). This listener is added by waitForRequestedEventCommon(), which is indirectly called by the test when is calls waitForRequestedEventSet(). > > There is a also 2nd place where the ThreadStartEvent listener for "spurious" threads is needed. It is now also installed with the default listeners that are always in place. It is needed when the test is not actually waiting for a ThreadStartEvent, but is waiting for a BreakpointEvent. waitForRequestedEventCommon() is not used in this case (so none of its listeners are installed), but the default listeners are always in place and can be used to filter these ThreadStartEvents. Note this filter will also be in place when calling waitForRequestedEventCommon(), but we can't realy on it when waitForRequestedEventCommon() is used for ThreadStartEvents because the spurious ThreadStartEvent will be seen and returned before we ever get to the default filter. So we actually end up with this ThreadStartEvent listener installed twice during waitForRequestedEventCommon(). > > I did a bit of cleanup on the test, mostly renaming of threads and ThreadStartRequests so they are easier to match up with the iteration # we use in both the debuggee and debugger (0, 1, and 2). The only real change in the test itself is removing the filter count, and verifying that the ThreadStartEvent is for the expected thread. This pull request has now been integrated. Changeset: 8b0eb729 Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/8b0eb7299a5d0e142453ed5c7a17308077e27993 Stats: 97 lines in 4 files changed: 73 ins; 1 del; 23 mod 8289765: JDI EventSet/resume/resume008 failed with "ERROR: suspendCounts don't match for : VirtualThread-unparker" Reviewed-by: sspitsyn, kevinw ------------- PR: https://git.openjdk.org/jdk/pull/12861 From cjplummer at openjdk.org Thu Mar 9 21:59:25 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 9 Mar 2023 21:59:25 GMT Subject: Integrated: 8303609: ProblemList serviceability/sa/TestSysProps.java with ZGC In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 22:40:56 GMT, Chris Plummer wrote: > Although it takes both ZGC and -Xcomp to cause the test to fail, we have no way to problem list for just that combination, so I'm choosing the problem list with just ZGC since it is the main cause of the failure. I've only seen this issue on windows-x64, but there are clearly failures on linux and macos in mach5, so I'm problem listing for all platforms. This pull request has now been integrated. Changeset: af0ca78a Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/af0ca78a8f8108fd81dcdfaa6b8a43a940942633 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8303609: ProblemList serviceability/sa/TestSysProps.java with ZGC Reviewed-by: dcubed ------------- PR: https://git.openjdk.org/jdk/pull/12935 From amenkov at openjdk.org Thu Mar 9 22:00:30 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 9 Mar 2023 22:00:30 GMT Subject: Integrated: 8303924: ProblemList serviceability/sa/UniqueVtableTest.java on Linux In-Reply-To: References: Message-ID: <2qQFOLbI-_P22f7Aq2hTYNZjMPMwZnb_DELzqer1EHE=.574c2590-e290-43ee-8534-7eec3e9aa915@github.com> On Thu, 9 Mar 2023 21:36:32 GMT, Alex Menkov wrote: > The test fails intermittently on linux-x64-debug and linux-aarch64-debug This pull request has now been integrated. Changeset: e930b63a Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/e930b63a8f166502740bca45e3d022f69fc04b53 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8303924: ProblemList serviceability/sa/UniqueVtableTest.java on Linux Reviewed-by: dcubed ------------- PR: https://git.openjdk.org/jdk/pull/12962 From dcubed at openjdk.org Thu Mar 9 22:02:41 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 22:02:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 21:08:16 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Various small fixes and improvements > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use realloc instead of malloc+copy when growing the lock-stack > - Inline initial LockStack stack > - Fix interpreter asymmetric fast-locking > - Fix merge error (move done label into correct places) > - Merge branch 'master' into JDK-8291555-v2 > - Small fixes > - Fix anon owner in fast-path, avoid runtime call (aarch64) > - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 This project is currently based on jdk-21+14-1079. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Thu Mar 9 22:27:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 9 Mar 2023 22:27:11 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: References: Message-ID: <9TvBQ8ScbPmz7uMFo8fWYly239z03-tYBsNWV1N0s4A=.344c11c7-c375-4e75-a4b2-7ed1592971bc@github.com> On Tue, 7 Mar 2023 22:14:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash. > > As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have bee n already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues. > > The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired. > > I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test. > > Thanks, > Patricio Looks good. Thank you for taking care about it! Serguei, Thanks ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12911 From pchilanomate at openjdk.org Thu Mar 9 22:38:10 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 9 Mar 2023 22:38:10 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: <9TvBQ8ScbPmz7uMFo8fWYly239z03-tYBsNWV1N0s4A=.344c11c7-c375-4e75-a4b2-7ed1592971bc@github.com> References: <9TvBQ8ScbPmz7uMFo8fWYly239z03-tYBsNWV1N0s4A=.344c11c7-c375-4e75-a4b2-7ed1592971bc@github.com> Message-ID: On Thu, 9 Mar 2023 22:24:00 GMT, Serguei Spitsyn wrote: > Looks good. Thank you for taking care about it! Serguei, Thanks > Thanks for the review Serguei! ------------- PR: https://git.openjdk.org/jdk/pull/12911 From pchilanomate at openjdk.org Thu Mar 9 22:57:22 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 9 Mar 2023 22:57:22 GMT Subject: Integrated: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 22:14:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash. > > As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have bee n already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues. > > The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired. > > I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test. > > Thanks, > Patricio This pull request has now been integrated. Changeset: 8b740b46 Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/8b740b46091c853c7cb66c361deda6dfbb2cedc8 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV Reviewed-by: coleenp, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12911 From dcubed at openjdk.org Thu Mar 9 23:23:26 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 23:23:26 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 21:08:16 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Various small fixes and improvements > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use realloc instead of malloc+copy when growing the lock-stack > - Inline initial LockStack stack > - Fix interpreter asymmetric fast-locking > - Fix merge error (move done label into correct places) > - Merge branch 'master' into JDK-8291555-v2 > - Small fixes > - Fix anon owner in fast-path, avoid runtime call (aarch64) > - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 Another chunk of partial review. This time I did the src/hotspot/cpu/aarch64 and src/hotspot/cpu/arm files: src/hotspot/cpu/aarch64/aarch64.ad src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp src/hotspot/cpu/aarch64/c2_CodeStubs_aarch64.cpp src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp src/hotspot/cpu/aarch64/stubRoutines_aarch64.cpp src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp src/hotspot/cpu/arm/c1_LIRAssembler_arm.cpp src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp src/hotspot/cpu/aarch64/aarch64.ad line 3856: > 3854: // Indicate success at cont. > 3855: __ cmp(oop, oop); > 3856: __ b(count); This code does `b(count)` and the `count` label is after the `cont` label so the comment on L3912 doesn't quite make sense. Perhaps: `// Indicate success on completion.` src/hotspot/cpu/aarch64/c2_CodeStubs_aarch64.cpp line 68: > 66: > 67: int C2CheckLockStackStub::max_size() const { > 68: return 20; nit - a comment explaining this choice of literal value ('20') would be useful src/hotspot/cpu/aarch64/c2_CodeStubs_aarch64.cpp line 79: > 77: > 78: int C2HandleAnonOMOwnerStub::max_size() const { > 79: return 20; nit - a comment explaining this choice of literal value ('20') would be useful src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5491: > 5489: > 5490: > 5491: __ pop_call_clobbered_registers(); nit - double blank lines here for a reason? src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp line 56: > 54: } > 55: > 56: void C1_MacroAssembler::build_frame(int frame_size_in_bytes, int bang_size_in_bytes, int max_monitors) { So the `max_monitors` param is added, but not use of it. Is someone else doing the 32-bit ARM port? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 9 23:23:30 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 23:23:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v15] In-Reply-To: References: Message-ID: <6OKumxjpgT6L0uYvNnYr4tZO3VjvC_ixqFoaRo7bBHs=.8bad7e67-e663-4371-aab8-830df61bbf0e@github.com> On Wed, 8 Mar 2023 18:25:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Inline initial LockStack stack src/hotspot/cpu/aarch64/aarch64.ad line 3914: > 3912: // Indicate success at cont. > 3913: __ cmp(oop, oop); > 3914: __ b(count); This code does `b(count)` and the `count` label is after the `cont` label so the comment on L3912 doesn't quite make sense. Perhaps: `// Indicate success on completion.` ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Fri Mar 10 01:06:31 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 10 Mar 2023 01:06:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 21:08:16 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Various small fixes and improvements > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use realloc instead of malloc+copy when growing the lock-stack > - Inline initial LockStack stack > - Fix interpreter asymmetric fast-locking > - Fix merge error (move done label into correct places) > - Merge branch 'master' into JDK-8291555-v2 > - Small fixes > - Fix anon owner in fast-path, avoid runtime call (aarch64) > - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 Another chunk of partial review. This time I did the src/hotspot/cpu/x86 files: src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp src/hotspot/cpu/x86/c1_MacroAssembler_x86.hpp src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp src/hotspot/cpu/x86/interp_masm_x86.cpp src/hotspot/cpu/x86/macroAssembler_x86.cpp src/hotspot/cpu/x86/macroAssembler_x86.hpp src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp src/hotspot/cpu/x86/stubGenerator_x86_64.cpp src/hotspot/cpu/x86/stubGenerator_x86_64.hpp src/hotspot/cpu/x86/stubRoutines_x86.cpp src/hotspot/cpu/x86/stubRoutines_x86.hpp src/hotspot/cpu/x86/x86_32.ad src/hotspot/cpu/x86/x86_64.ad src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 77: > 75: > 76: int C2CheckLockStackStub::max_size() const { > 77: return 10; nit - a comment explaining this choice of literal value ('10') would be useful src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 89: > 87: #ifdef _LP64 > 88: int C2HandleAnonOMOwnerStub::max_size() const { > 89: return 17; nit - a comment explaining this choice of literal value ('17') would be useful src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 624: > 622: jmp(COUNT); > 623: #else > 624: // We can not emit the lock-stack-check in verified_entry() because we don't have enough nit typo: s/can not/cannot/ src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 632: > 630: jmp(COUNT); > 631: bind(slow); > 632: testptr(objReg, objReg); // ZF=0 to indicate failure nit perhaps: // force ZF=0 to indicate failure src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 694: > 692: // Invariant: tmpReg == 0. tmpReg is EAX which is the implicit cmpxchg comparand. > 693: lock(); > 694: cmpxchgptr(thread, Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); Now that `fast_lock` is being passed in a `thread` register, you've switched from using `scrReg` to using `thread`. Of course, this means that the comment from L685 -> L691 needs to be revisited/rewritten. Unfortunately, the GitHub UI doesn't let me highlight from L685 -> L690 as part of this comment. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 815: > 813: jccb(Assembler::zero, Stacked); > 814: if (UseFastLocking) { > 815: // If the owner is ANONYMOUS, we need to fix it - in an outline stube. nit typo: s/outline stube/outline stub/ src/hotspot/cpu/x86/interp_masm_x86.cpp line 1357: > 1355: cmpptr(obj_reg, Address(tmp, -oopSize)); > 1356: jcc(Assembler::notEqual, slow_case); > 1357: // Try to swing header from locked to unlock. nit typo: s/locked to unlock./locked to unlocked./ src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9733: > 9731: #else > 9732: const Register thread = rax; > 9733: get_thread(rax); Other places that use this idiom do it like this: const Register thread = rax; get_thread(thread); ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Fri Mar 10 03:54:01 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 10 Mar 2023 03:54:01 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:55:06 GMT, Patricio Chilano Mateo wrote: > Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". > I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. > > Thanks, > Patricio This looks good. Thank you for catching and taking care about it! Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12956 From dholmes at openjdk.org Fri Mar 10 04:40:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Mar 2023 04:40:13 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:55:06 GMT, Patricio Chilano Mateo wrote: > Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". > I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. > > Thanks, > Patricio src/hotspot/share/prims/jvmtiThreadState.cpp line 372: > 370: java_lang_Thread::dec_VTMS_transition_disable_count(vth()); > 371: Atomic::dec(&_VTMS_transition_disable_for_one_count); > 372: if (_VTMS_transition_disable_for_one_count == 0 || _is_SR) { Sorry I don't understand why this `_is_SR` check was removed. I admit I can't really figure out what this field means anyway, but there is nothing in the issue description that suggests this also needs changing - and it is now different to `VTMS_transition_enable_for_all`. ------------- PR: https://git.openjdk.org/jdk/pull/12956 From cjplummer at openjdk.org Fri Mar 10 05:50:03 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 10 Mar 2023 05:50:03 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 11:30:27 GMT, Daniel Jeli?ski wrote: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. I'm approving the SA changes. Thanks for the testing. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12922 From djelinski at openjdk.org Fri Mar 10 08:06:16 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 10 Mar 2023 08:06:16 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 00:17:42 GMT, Naoto Sato wrote: >> This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. >> >> Other changes include: >> - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. >> - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. >> - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. >> - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. >> - `getLastErrorString` is no longer exported by libjava. >> >> Tier1-3 tests continue to pass. >> >> No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. >> Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. > > src/java.base/windows/native/libnio/ch/FileDispatcherImpl.c line 208: > >> 206: >> 207: if (result == 0) { >> 208: JNU_ThrowIOExceptionWithLastError(env, "Write failed"); > > Could be replaced with `JNU_ThrowIOException`? If we got here, `WriteFile` just failed and `GetLastError` contains interesting information. `JNU_ThrowIOExceptionWithLastError` will generate specific error message in user's language, `JNU_ThrowIOException` would just throw `Write failed`. I don't think we want to change this. ------------- PR: https://git.openjdk.org/jdk/pull/12922 From rkennke at openjdk.org Fri Mar 10 09:41:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 09:41:10 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v17] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fixes in response to Daniel's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/f9f93b36..51a00e91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=15-16 Stats: 9 lines in 3 files changed: 6 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 10 09:41:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 09:41:27 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 23:17:39 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: >> >> - Merge branch 'master' into JDK-8291555-v2 >> - Various small fixes and improvements >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use realloc instead of malloc+copy when growing the lock-stack >> - Inline initial LockStack stack >> - Fix interpreter asymmetric fast-locking >> - Fix merge error (move done label into correct places) >> - Merge branch 'master' into JDK-8291555-v2 >> - Small fixes >> - Fix anon owner in fast-path, avoid runtime call (aarch64) >> - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 > > src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp line 56: > >> 54: } >> 55: >> 56: void C1_MacroAssembler::build_frame(int frame_size_in_bytes, int bang_size_in_bytes, int max_monitors) { > > So the `max_monitors` param is added, but not use of it. > Is someone else doing the 32-bit ARM port? Hopefully :-) I currently can't do it, though. With fast-locking (and the rest of Lilliput) behind an experimental flag, this is probably ok for now? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 10 09:55:03 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 09:55:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v18] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fixes in response to Daniel's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/51a00e91..8ba676a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=16-17 Stats: 20 lines in 4 files changed: 6 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From mgronlun at openjdk.org Fri Mar 10 10:43:23 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 10 Mar 2023 10:43:23 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: more cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/abeaa324..741b8686 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=08-09 Stats: 12 lines in 3 files changed: 1 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From rkennke at openjdk.org Fri Mar 10 12:25:09 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 12:25:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v19] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use nullptr instead of NULL in touched code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/8ba676a0..8aad280a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=17-18 Stats: 10 lines in 6 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 10 12:36:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 12:36:28 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v20] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Use nullptr instead of NULL in touched code (x86) - Use nullptr instead of NULL in touched code (riscv) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/8aad280a..27a4b107 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=18-19 Stats: 17 lines in 8 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 10 12:45:12 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 12:45:12 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Use nullptr instead of NULL in touched code (shared) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/27a4b107..5fe2afcf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=19-20 Stats: 8 lines in 6 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From tschatzl at openjdk.org Fri Mar 10 14:06:16 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 10 Mar 2023 14:06:16 GMT Subject: RFR: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code Message-ID: Hi all, please review this refactoring that replaces various casts in GC more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. The ones in ZGC files were skipped on request. Testing: local compilation, gha Thanks, Thomas ------------- Commit messages: - Initial version Changes: https://git.openjdk.org/jdk/pull/12973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12973&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303963 Stats: 15 lines in 13 files changed: 0 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/12973.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12973/head:pull/12973 PR: https://git.openjdk.org/jdk/pull/12973 From ayang at openjdk.org Fri Mar 10 14:24:07 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 10 Mar 2023 14:24:07 GMT Subject: RFR: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:58:42 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that replaces various casts in GC more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. > The ones in ZGC files were skipped on request. > > Testing: local compilation, gha > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12973 From fparain at openjdk.org Fri Mar 10 15:31:16 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 10 Mar 2023 15:31:16 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 16:25:07 GMT, Coleen Phillimore wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > src/hotspot/share/oops/fieldStreams.hpp line 104: > >> 102: AccessFlags flags; >> 103: flags.set_flags(field()->access_flags()); >> 104: return flags; > > Did this used to do this for a reason? Using the setter rather than the constructor filters out the VM defined flags and keeps only the flags from the class file. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From pchilanomate at openjdk.org Fri Mar 10 17:03:16 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 10 Mar 2023 17:03:16 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 03:51:05 GMT, Serguei Spitsyn wrote: > This looks good. Thank you for catching and taking care about it! Serguei > Thanks for the review Serguei! ------------- PR: https://git.openjdk.org/jdk/pull/12956 From pchilanomate at openjdk.org Fri Mar 10 17:11:18 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 10 Mar 2023 17:11:18 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 04:34:13 GMT, David Holmes wrote: >> Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". >> I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. >> >> Thanks, >> Patricio > > src/hotspot/share/prims/jvmtiThreadState.cpp line 372: > >> 370: java_lang_Thread::dec_VTMS_transition_disable_count(vth()); >> 371: Atomic::dec(&_VTMS_transition_disable_for_one_count); >> 372: if (_VTMS_transition_disable_for_one_count == 0 || _is_SR) { > > Sorry I don't understand why this `_is_SR` check was removed. I admit I can't really figure out what this field means anyway, but there is nothing in the issue description that suggests this also needs changing - and it is now different to `VTMS_transition_enable_for_all`. A JvmtiVTMSTransitionDisabler instance that is a "single disabler" only blocks other virtual threads trying to transition or JvmtiVTMSTransitionDisabler monopolists. Both of them will check for _VTMS_transition_disable_for_one_count (the JvmtiVTMSTransitionDisabler monopolist was missing that check) so just checking when that counter is zero is enough. In fact, for a "single disabler" _is_SR is always false so that check wasn't doing anything. Yes, this is not actually needed for the fix, but when looking at which condition we use to wait and which one to notify I caught this, sorry for not explaining that part. And looking closer at VTMS_transition_enable_for_all() now I see the check for _is_SR is not doing anything too, because if _VTMS_transition_disable_for_all_count was not zero after the decrement then this can't be a JvmtiVTMSTransitionDisabler monopolist, i.e _is_SR will be false. When a monopolist is running all other "disable all" JvmtiVTMSTransitionDisabler instances if any will be waiting in the first "while (_SR_mode)" loop in VTMS_transition_disable_for_all(), so _VTMS_transition_disable_for_all_count will be one through the monopolist run. So this should be an assert after the decrement: assert(!_is_SR || _VTMS_transition_disable_for_all_count == 0, ""). ------------- PR: https://git.openjdk.org/jdk/pull/12956 From dcubed at openjdk.org Fri Mar 10 20:14:26 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 10 Mar 2023 20:14:26 GMT Subject: Integrated: 8304005: ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in Xcomp mode Message-ID: A trivial fix to ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in -Xcomp mode ------------- Commit messages: - 8304005: ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in Xcomp mode Changes: https://git.openjdk.org/jdk/pull/12983/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12983&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304005 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12983.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12983/head:pull/12983 PR: https://git.openjdk.org/jdk/pull/12983 From rriggs at openjdk.org Fri Mar 10 20:14:27 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 10 Mar 2023 20:14:27 GMT Subject: Integrated: 8304005: ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in Xcomp mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 19:53:31 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in -Xcomp mode Marked as reviewed by rriggs (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12983 From dcubed at openjdk.org Fri Mar 10 20:14:28 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 10 Mar 2023 20:14:28 GMT Subject: Integrated: 8304005: ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in Xcomp mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 20:01:43 GMT, Roger Riggs wrote: >> A trivial fix to ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in -Xcomp mode > > Marked as reviewed by rriggs (Reviewer). @RogerRiggs - Thanks for the fast review. ------------- PR: https://git.openjdk.org/jdk/pull/12983 From dcubed at openjdk.org Fri Mar 10 20:14:30 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 10 Mar 2023 20:14:30 GMT Subject: Integrated: 8304005: ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in Xcomp mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 19:53:31 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in -Xcomp mode This pull request has now been integrated. Changeset: d7f4221b Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/d7f4221bfe9637a7961f30a25196a0e3161baafd Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8304005: ProblemList serviceability/AsyncGetCallTrace/MyPackage/ASGCTBaseTest.java on linux-x64 in Xcomp mode Reviewed-by: rriggs ------------- PR: https://git.openjdk.org/jdk/pull/12983 From kbarrett at openjdk.org Fri Mar 10 20:36:58 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 10 Mar 2023 20:36:58 GMT Subject: RFR: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:58:42 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that replaces various casts in GC and more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. > The ones in ZGC files were skipped on request. > > Testing: local compilation, gha > > Thanks, > Thomas Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/12973 From rriggs at openjdk.org Fri Mar 10 22:05:16 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 10 Mar 2023 22:05:16 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 11:30:27 GMT, Daniel Jeli?ski wrote: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. src/java.base/share/native/libjava/jni_util.c line 133: > 131: if (s != NULL) { > 132: jobject x = NULL; > 133: if (messagelen) { Avoid implicit compare with 0; use `messagelen > 0` or similar. src/java.base/share/native/libjava/jni_util.c line 137: > 135: size_t messageextlen = messagelen + 4; > 136: char *str1 = (char *)malloc((messageextlen) * sizeof(char)); > 137: if (str1 == 0) { Use NULL when comparing to a pointer. src/java.base/unix/native/libjava/jni_util_md.c line 69: > 67: if (errno == 0) return NULL; > 68: getErrorString(errno, buf, sizeof(buf)); > 69: return (*buf) ? JNU_NewStringPlatform(env, buf) : NULL; I would have used `buf[0] != 0` since its declared as a local buffer; but this file doesn't have much of a style to follow. ------------- PR: https://git.openjdk.org/jdk/pull/12922 From stuefe at openjdk.org Sat Mar 11 14:35:35 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 14:35:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <0Ui89e1xMwZGBUG7hiyTfpr7FD9lJ5sXJJZg846XG54=.2f7843f0-f62b-43d1-8f4a-f21db7cc3666@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) I'm looking into the arm32 port. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Sat Mar 11 15:17:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 15:17:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <9Z5c4yco0VZ_8emD3C43P3LYtJfItTHkRyj9MCsKcNg=.b4342519-99bb-45d0-8d4b-169624e3ff2d@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) Proposal for omitting the lockstack size check (at least in 75% of all times): - We know that Thread as well as grown lockstack backing buffers start at malloc-aligned boundaries. Practically this is 16 (64-bit), 4-8 (32-bit). So at the very least 4. - Make the initial lockstack this size. Define it so that initial slot stack starts at offset 0. - Load the current slot pointer as you do now. Check the lowest 2 bits. If all are zero, go the slower path (load the current limit and compare against limit, ...). - If bit 0 or 1 are set, you can omit this check. You are done since you have not yet reached the limit. - You can expand this proposal to any alignment you like. You need to declare the lockstack slots with `alignof(X)`, and the compiler will take care that the *initial* slot stack is always well aligned. As for larger slot stacks, we will have to allocate them in an aligned fashion using posix_memalign (we need this as NMT-wrapped version, but thats trivial) ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Sat Mar 11 16:00:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sat, 11 Mar 2023 16:00:33 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) > Proposal for omitting the lockstack size check (at least in 75% of all times): > > * We know that Thread as well as grown lockstack backing buffers start at malloc-aligned boundaries. Practically this is 16 (64-bit), 4-8 (32-bit). So at the very least 4. > * Make the initial lockstack this size. Define it so that initial slot stack starts at offset 0. > * Load the current slot pointer as you do now. Check the lowest 2 bits. If all are zero, go the slower path (load the current limit and compare against limit, ...). > * If bit 0 or 1 are set, you can omit this check. You are done since you have not yet reached the limit. > * You can expand this proposal to any alignment you like. You need to declare the lockstack slots with `alignof(X)`, and the compiler will take care that the _initial_ slot stack is always well aligned. As for larger slot stacks, we will have to allocate them in an aligned fashion using posix_memalign (we need this as NMT-wrapped version, but thats trivial) This would only work when pushing a single slot, right? Have you seen what we're doing in the compiled (C1 and C2) paths (in x86_64 and aarch64)? There we're doing a (conservative) estimate how many lock-slots are needed in the method, and check for enough slots upon method entry once, and then elide the check altogether in the lock-enter implementation. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Sat Mar 11 16:14:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 16:14:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <6JDIBigQkgXxOQDE0UEeZhX8ountKQAliKpynUVzcbY=.abf3a7f5-6744-437c-aee0-86a197248a62@github.com> On Sat, 11 Mar 2023 15:57:53 GMT, Roman Kennke wrote: > > Proposal for omitting the lockstack size check (at least in 75% of all times): > > > > * We know that Thread as well as grown lockstack backing buffers start at malloc-aligned boundaries. Practically this is 16 (64-bit), 4-8 (32-bit). So at the very least 4. > > * Make the initial lockstack this size. Define it so that initial slot stack starts at offset 0. > > * Load the current slot pointer as you do now. Check the lowest 2 bits. If all are zero, go the slower path (load the current limit and compare against limit, ...). > > * If bit 0 or 1 are set, you can omit this check. You are done since you have not yet reached the limit. > > * You can expand this proposal to any alignment you like. You need to declare the lockstack slots with `alignof(X)`, and the compiler will take care that the _initial_ slot stack is always well aligned. As for larger slot stacks, we will have to allocate them in an aligned fashion using posix_memalign (we need this as NMT-wrapped version, but thats trivial) > > This would only work when pushing a single slot, right? Have you seen what we're doing in the compiled (C1 and C2) paths (in x86_64 and aarch64)? There we're doing a (conservative) estimate how many lock-slots are needed in the method, and check for enough slots upon method entry once, and then elide the check altogether in the lock-enter implementation. Yeah, I just realized this myself. I started working on the template interpreter first, where we push single stack slots. There it may still make sense. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Sat Mar 11 16:30:38 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 16:30:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) Not a full review, just stuff I stumbled over while looking into the arm port. General notes: I dislike the "Fast" moniker for UseFastLocking. Old thin locks were not particularly slow either. Also, I believe I have seen places where "fast locking/unlocking" were used before to describe stackbased locking. Can we name this something like "NewStyleThinLocks" or similar? src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 821: > 819: call_VM(noreg, > 820: CAST_FROM_FN_PTR(address, InterpreterRuntime::monitorenter), > 821: UseFastLocking ? obj_reg : lock_reg); The first call to InterpreterRuntime::monitorenter, under UseHeavyMonitors: Please add assert for !UseFastLocking. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234: > 6232: > 6233: // Load (object->mark() | 1) into hdr > 6234: orr(hdr, hdr, markWord::unlocked_value); I wondered why this is needed. Should we not have the header of an unloaded object in hdr? Or is this a safeguard against a misuse of this function (called with the header of an already locked object)? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6246: > 6244: str(obj, Address(t1, 0)); > 6245: add(t1, t1, oopSize); > 6246: str(t1, Address(rthread, JavaThread::lock_stack_current_offset())); This, and its counterpart pop, may be worth factoring out src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6267: > 6265: ldr(t1, Address(rthread, JavaThread::lock_stack_current_offset())); > 6266: sub(t1, t1, oopSize); > 6267: str(t1, Address(rthread, JavaThread::lock_stack_current_offset())); good comments, helpful src/hotspot/share/opto/c2_CodeStubs.hpp line 89: > 87: }; > 88: > 89: class C2CheckLockStackStub : public C2CodeStub { Badly named, please reconsider. This does not check the lock stack, it grows it. "Check" sounds like a non-modifying state verification. Proposal: C2EnsureLockStackSizeStub src/hotspot/share/runtime/globals.hpp line 1983: > 1981: \ > 1982: product(bool, UseFastLocking, false, EXPERIMENTAL, \ > 1983: "Use fast-locking instead of stack-locking") \ Please ergo-disable this for UseHeavyMonitors. One less thing to think about. src/hotspot/share/runtime/lockStack.hpp line 46: > 44: void grow(size_t min_capacity); > 45: > 46: void validate(const char* msg) const PRODUCT_RETURN; nit: these functions seem normally to be called "verify" src/hotspot/share/runtime/lockStack.hpp line 52: > 50: static ByteSize limit_offset() { return byte_offset_of(LockStack, _limit); } > 51: > 52: static void ensure_lock_stack_size(oop* _required_limit); I would split this, do the comparison inline, only the actual growth in the cpp file. src/hotspot/share/runtime/lockStack.hpp line 64: > 62: > 63: // GC support > 64: inline void oops_do(OopClosure* cl); Does this need to be nonconst? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Sat Mar 11 17:28:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 17:28:13 GMT Subject: RFR: 8304017: ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode Message-ID: Trivial fixes to ProblemList 3 different tests: [JDK-8304017](https://bugs.openjdk.org/browse/JDK-8304017) ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode [JDK-8304018](https://bugs.openjdk.org/browse/JDK-8304018) ProblemList javax/swing/JColorChooser/Test6827032.java on windows-x64 [JDK-8304019](https://bugs.openjdk.org/browse/JDK-8304019) ProblemList java/awt/dnd/MissingDragExitEventTest/MissingDragExitEventTest.java on windows-x64 ------------- Commit messages: - 8304019: ProblemList java/awt/dnd/MissingDragExitEventTest/MissingDragExitEventTest.java on windows-x64 - 8304018: ProblemList javax/swing/JColorChooser/Test6827032.java on windows-x64 - 8304017: ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode Changes: https://git.openjdk.org/jdk/pull/12990/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12990&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304017 Stats: 5 lines in 2 files changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12990.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12990/head:pull/12990 PR: https://git.openjdk.org/jdk/pull/12990 From stuefe at openjdk.org Sat Mar 11 17:37:20 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 17:37:20 GMT Subject: RFR: 8304017: ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 17:16:46 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList 3 different tests: > [JDK-8304017](https://bugs.openjdk.org/browse/JDK-8304017) ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode > [JDK-8304018](https://bugs.openjdk.org/browse/JDK-8304018) ProblemList javax/swing/JColorChooser/Test6827032.java on windows-x64 > [JDK-8304019](https://bugs.openjdk.org/browse/JDK-8304019) ProblemList java/awt/dnd/MissingDragExitEventTest/MissingDragExitEventTest.java on windows-x64 Looks good and trivial ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/12990 From dcubed at openjdk.org Sat Mar 11 17:41:29 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 17:41:29 GMT Subject: RFR: 8304017: ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 17:34:05 GMT, Thomas Stuefe wrote: >> Trivial fixes to ProblemList 3 different tests: >> [JDK-8304017](https://bugs.openjdk.org/browse/JDK-8304017) ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode >> [JDK-8304018](https://bugs.openjdk.org/browse/JDK-8304018) ProblemList javax/swing/JColorChooser/Test6827032.java on windows-x64 >> [JDK-8304019](https://bugs.openjdk.org/browse/JDK-8304019) ProblemList java/awt/dnd/MissingDragExitEventTest/MissingDragExitEventTest.java on windows-x64 > > Looks good and trivial @tstuefe - Thanks for the fast review and especially on a Saturday! ------------- PR: https://git.openjdk.org/jdk/pull/12990 From dcubed at openjdk.org Sat Mar 11 17:41:30 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 17:41:30 GMT Subject: Integrated: 8304017: ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 17:16:46 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList 3 different tests: > [JDK-8304017](https://bugs.openjdk.org/browse/JDK-8304017) ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode > [JDK-8304018](https://bugs.openjdk.org/browse/JDK-8304018) ProblemList javax/swing/JColorChooser/Test6827032.java on windows-x64 > [JDK-8304019](https://bugs.openjdk.org/browse/JDK-8304019) ProblemList java/awt/dnd/MissingDragExitEventTest/MissingDragExitEventTest.java on windows-x64 This pull request has now been integrated. Changeset: fbc76c2c Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/fbc76c2c7866204783803d2ac829fb95b040a015 Stats: 5 lines in 2 files changed: 3 ins; 0 del; 2 mod 8304017: ProblemList com/sun/jdi/InvokeHangTest.java on windows-x64 in vthread mode 8304018: ProblemList javax/swing/JColorChooser/Test6827032.java on windows-x64 8304019: ProblemList java/awt/dnd/MissingDragExitEventTest/MissingDragExitEventTest.java on windows-x64 Reviewed-by: stuefe ------------- PR: https://git.openjdk.org/jdk/pull/12990 From dcubed at openjdk.org Sat Mar 11 18:17:34 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 18:17:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v18] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 09:55:03 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fixes in response to Daniel's review src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 684: > 682: xorptr(tmpReg, tmpReg); > 683: > 684: // Appears unlocked - try to swing _owner from null to curren thread. nit typo: s/curren thread/current thread/ ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Sat Mar 11 18:45:35 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 18:45:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <4Wi21QdftsyLGcwMc0P4ho3ZS6VZP4gP0MxWkok_gtM=.8c93b6b3-1f48-4979-833e-275278de9d98@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) Another partial review. This time I reviewed these files: src/hotspot/share/c1/c1_Compilation.cpp src/hotspot/share/c1/c1_Compilation.hpp src/hotspot/share/c1/c1_GraphBuilder.cpp src/hotspot/share/c1/c1_LIRAssembler.cpp src/hotspot/share/c1/c1_MacroAssembler.hpp src/hotspot/share/c1/c1_Runtime1.cpp src/hotspot/share/interpreter/interpreterRuntime.cpp src/hotspot/share/opto/c2_CodeStubs.hpp src/hotspot/share/opto/compile.cpp src/hotspot/share/opto/compile.hpp src/hotspot/share/opto/locknode.cpp src/hotspot/share/opto/parse1.cpp src/hotspot/share/opto/compile.hpp line 637: > 635: void push_monitor() { _max_monitors++; } > 636: void reset_max_monitors() { _max_monitors = 0; } > 637: uint max_monitors() { return _max_monitors; } The prevailing style in this file appears to have some indenting after the ')' and before the '{'. It's somewhat inconsistent as to how much, but mostly more than a single space. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Sat Mar 11 18:52:31 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 18:52:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) I was pleasantly surprised at how few C1 and C2 changes were needed. Nice! At this point, I have not reviewed the 'ppc', 'riscv' or s390 files so I've done a first pass review of 63 of the 74 files. I'll have to double check that I didn't miss anything that I need to review and I'll have to do another crawl thru review pass after letting the code percolate in my brain for a few days without looking at it. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dholmes at openjdk.org Mon Mar 13 05:37:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 05:37:25 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:55:06 GMT, Patricio Chilano Mateo wrote: > Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". > I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. > > Thanks, > Patricio Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12956 From dholmes at openjdk.org Mon Mar 13 05:37:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 05:37:28 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 17:06:58 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/prims/jvmtiThreadState.cpp line 372: >> >>> 370: java_lang_Thread::dec_VTMS_transition_disable_count(vth()); >>> 371: Atomic::dec(&_VTMS_transition_disable_for_one_count); >>> 372: if (_VTMS_transition_disable_for_one_count == 0 || _is_SR) { >> >> Sorry I don't understand why this `_is_SR` check was removed. I admit I can't really figure out what this field means anyway, but there is nothing in the issue description that suggests this also needs changing - and it is now different to `VTMS_transition_enable_for_all`. > > A JvmtiVTMSTransitionDisabler instance that is a "single disabler" only blocks other virtual threads trying to transition or JvmtiVTMSTransitionDisabler monopolists. Both of them will check for _VTMS_transition_disable_for_one_count (the JvmtiVTMSTransitionDisabler monopolist was missing that check) so just checking when that counter is zero is enough. In fact, for a "single disabler" _is_SR is always false so that check wasn't doing anything. Yes, this is not actually needed for the fix, but when looking at which condition we use to wait and which one to notify I caught this, sorry for not explaining that part. > > And looking closer at VTMS_transition_enable_for_all() now I see the check for _is_SR is not doing anything too, because if _VTMS_transition_disable_for_all_count was not zero after the decrement then this can't be a JvmtiVTMSTransitionDisabler monopolist, i.e _is_SR will be false. When a monopolist is running all other "disable all" JvmtiVTMSTransitionDisabler instances if any will be waiting in the first "while (_SR_mode)" loop in VTMS_transition_disable_for_all(), so _VTMS_transition_disable_for_all_count will be one through the monopolist run. So this should be an assert after the decrement: assert(!_is_SR || _VTMS_transition_disable_for_all_count == 0, ""). Thanks for clarifying - I was puzzled by the way `is_SR` was being used. ------------- PR: https://git.openjdk.org/jdk/pull/12956 From dholmes at openjdk.org Mon Mar 13 06:51:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 06:51:33 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 10:43:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup Still working through this. A few minor comments below. src/hotspot/share/prims/agent.cpp line 34: > 32: } > 33: > 34: static const char* allocate_copy(const char* str) { Why not just use `os::strdup`? src/hotspot/share/prims/agentList.cpp line 64: > 62: void AgentList::add_xrun(const char* name, char* options, bool absolute_path) { > 63: Agent* agent = new Agent(name, options, absolute_path); > 64: agent->_is_xrun = true; Why direct access of private field instead of having a setter like other parts of the Agent API? src/hotspot/share/prims/agentList.cpp line 227: > 225: * store data in their JvmtiEnv local storage. > 226: * > 227: * Please see JPLISAgent.c in module java.instrument, see JPLISAgent.h and JPLISAgent.c. No need to mention the .c file twice. src/hotspot/share/prims/agentList.cpp line 419: > 417: const jint err = (*on_load_entry)(&main_vm, const_cast(agent->options()), NULL); > 418: if (err != JNI_OK) { > 419: vm_exit_during_initialization("-Xrun library failed to init", agent->name()); Do you need to be back in `_thread_in_vm` before exiting? src/hotspot/share/prims/agentList.cpp line 542: > 540: > 541: // Invoke the Agent_OnAttach function > 542: JavaThread* THREAD = JavaThread::current(); // For exception macros. Nit: just use `current` rather than `THREAD` and don't use the exception macros. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From dholmes at openjdk.org Mon Mar 13 06:51:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 06:51:37 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v9] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 16:58:42 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > handle multiple envs with same VMInit callback src/hotspot/share/prims/agent.cpp line 41: > 39: char* copy = AllocateHeap(length + 1, mtInternal); > 40: strncpy(copy, str, length + 1); > 41: assert(strncmp(copy, str, length + 1) == 0, "invariant"); Unclear what you are checking here. Don't you trust strncpy? ------------- PR: https://git.openjdk.org/jdk/pull/12923 From djelinski at openjdk.org Mon Mar 13 09:48:19 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 13 Mar 2023 09:48:19 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v2] In-Reply-To: References: Message-ID: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. Daniel Jeli?ski has updated the pull request incrementally with three additional commits since the last revision: - Address review comments - Mention that the returned text is static and thread safe - Define buffer size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12922/files - new: https://git.openjdk.org/jdk/pull/12922/files/8ab8d729..ea91b651 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12922&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12922&range=00-01 Stats: 8 lines in 4 files changed: 2 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12922.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12922/head:pull/12922 PR: https://git.openjdk.org/jdk/pull/12922 From djelinski at openjdk.org Mon Mar 13 09:48:22 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 13 Mar 2023 09:48:22 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v2] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 21:47:45 GMT, Roger Riggs wrote: >> Daniel Jeli?ski has updated the pull request incrementally with three additional commits since the last revision: >> >> - Address review comments >> - Mention that the returned text is static and thread safe >> - Define buffer size > > src/java.base/share/native/libjava/jni_util.c line 133: > >> 131: if (s != NULL) { >> 132: jobject x = NULL; >> 133: if (messagelen) { > > Avoid implicit compare with 0; use `messagelen > 0` or similar. preexisting, but fixed. ------------- PR: https://git.openjdk.org/jdk/pull/12922 From djelinski at openjdk.org Mon Mar 13 09:48:27 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 13 Mar 2023 09:48:27 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v2] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:08:32 GMT, Naoto Sato wrote: >> Daniel Jeli?ski has updated the pull request incrementally with three additional commits since the last revision: >> >> - Address review comments >> - Mention that the returned text is static and thread safe >> - Define buffer size > > src/java.base/share/native/libzip/zip_util.c line 767: > >> 765: * or NULL if an error occurred. If a zip error occurred then *pmsg will >> 766: * be set to the error message text if pmsg != 0. Otherwise, *pmsg will be >> 767: * set to NULL. Caller doesn't need to free the error message. > > I'd put some more context here why the caller does not need to free. (as it is a static text) Added; also mentioned that we want the buffer to be thread-safe. Let me know if that's what you had in mind. > src/java.base/windows/native/libjava/jni_util_md.c line 80: > >> 78: 0, >> 79: buf, >> 80: sizeof(buf) / sizeof(WCHAR), > > Maybe better to #define the size 256 so that this division is not needed. done ------------- PR: https://git.openjdk.org/jdk/pull/12922 From adinn at openjdk.org Mon Mar 13 09:48:43 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 13 Mar 2023 09:48:43 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 10:43:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup src/hotspot/share/jfr/metadata/metadata.xml line 1182: > 1180: > 1181: > 1182: @mgronlun A somewhat drive-by comment. It might be clearer if you renamed these event fields and accessors, plus also the corresponding fields and accessors in class Agent, as `initializationTime` and `initializationDuration`. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From adinn at openjdk.org Mon Mar 13 09:52:38 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 13 Mar 2023 09:52:38 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 06:29:11 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> more cleanup > > src/hotspot/share/prims/agentList.cpp line 64: > >> 62: void AgentList::add_xrun(const char* name, char* options, bool absolute_path) { >> 63: Agent* agent = new Agent(name, options, absolute_path); >> 64: agent->_is_xrun = true; > > Why direct access of private field instead of having a setter like other parts of the Agent API? n.b. that also applies for accesses/updates to field _next. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From tschatzl at openjdk.org Mon Mar 13 09:59:40 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 13 Mar 2023 09:59:40 GMT Subject: RFR: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 14:20:54 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review this refactoring that replaces various casts in GC and more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. >> The ones in ZGC files were skipped on request. >> >> Testing: local compilation, gha >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @kimbarrett for your reviews ------------- PR: https://git.openjdk.org/jdk/pull/12973 From tschatzl at openjdk.org Mon Mar 13 09:59:42 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 13 Mar 2023 09:59:42 GMT Subject: Integrated: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:58:42 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that replaces various casts in GC and more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. > The ones in ZGC files were skipped on request. > > Testing: local compilation, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: b575e54b Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/b575e54bc96c8fc413893dbbe91d0b5ce0192179 Stats: 15 lines in 13 files changed: 0 ins; 2 del; 13 mod 8303963: Replace various encodings of UINT/SIZE_MAX in gc code Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/12973 From stuefe at openjdk.org Mon Mar 13 10:15:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 10:15:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) More comments src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 884: > 882: fast_unlock(obj_reg, header_reg, swap_reg, rscratch1, slow_case); > 883: b(count); > 884: bind(slow_case); small nit, move the bind to where the slow case actually is? src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 892: > 890: // Test for recursion > 891: cbz(header_reg, count); > 892: Not your patch, but I found interesting that arm does actually zero out the object slot in the BasicLock on the stack. I assume that is not needed, right? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6220: > 6218: // - obj: the object to be locked > 6219: // - hdr: the header, already loaded from obj, will be destroyed > 6220: // - t1, t2, t3: temporary registers, will be destroyed Adapt comment: we don't use t3 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6249: > 6247: } > 6248: > 6249: void MacroAssembler::fast_unlock(Register obj, Register hdr, Register t1, Register t2, Label& slow) { Could you add a comment here too as you did for lock? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Mon Mar 13 10:15:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 10:15:40 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> References: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> Message-ID: <58K_81BGoa-o4Pq1LtVhV_fOyYvSIxV1e66syvpefro=.9496823a-207e-467b-917d-f0f8852cda5c@github.com> On Sat, 11 Mar 2023 14:53:29 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use nullptr instead of NULL in touched code (shared) > > src/hotspot/share/runtime/lockStack.hpp line 52: > >> 50: static ByteSize limit_offset() { return byte_offset_of(LockStack, _limit); } >> 51: >> 52: static void ensure_lock_stack_size(oop* _required_limit); > > I would split this, do the comparison inline, only the actual growth in the cpp file. Just realized that this interface is actually a bit odd: since we pass a wish pointer that has nothing to do with either current state nor final result. In fact, the pointer could at the moment point into the lock stack of a different thread. So this is "the pointer that would designate the end of the LockStack if the lockstack were enlarged *in-place*". Maybe add a comment like that. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Mon Mar 13 10:24:39 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 10:24:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <6jWzeHbL7AH2PDn3-k_3B8jWKfVs3VEG9up7pw265n0=.3ce51a4e-4148-433f-991f-606802d79d50@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) Lockstack::grow: I would add a mode (either tied to ASSERT or as a stand-alone switch, I prefer the latter) to give us just as many slots as needed and no more. With NMT on, this will give us trailing canaries so we have overwriter alerts on the next grow. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From fparain at openjdk.org Mon Mar 13 12:09:10 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 12:09:10 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v2] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: Addressing comments from first reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/42a4d6a0..ce1180ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=00-01 Stats: 111 lines in 13 files changed: 40 ins; 22 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From rriggs at openjdk.org Mon Mar 13 15:10:31 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 13 Mar 2023 15:10:31 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v2] In-Reply-To: References: Message-ID: <88rPs699hRVHL_w1dIfbGgp5fhsoP9a4aTXqtuXZD8g=.1a441cba-0807-4de1-9174-92e2eeff3dad@github.com> On Mon, 13 Mar 2023 09:48:19 GMT, Daniel Jeli?ski wrote: >> This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. >> >> Other changes include: >> - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. >> - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. >> - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. >> - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. >> - `getLastErrorString` is no longer exported by libjava. >> >> Tier1-3 tests continue to pass. >> >> No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. >> Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. > > Daniel Jeli?ski has updated the pull request incrementally with three additional commits since the last revision: > > - Address review comments > - Mention that the returned text is static and thread safe > - Define buffer size LGTM for some pre-existing code style inconsistencies. src/java.base/share/native/libzip/zip_util.c line 812: > 810: > 811: if (strlen(name) >= PATH_MAX) { > 812: if (pmsg) { Some pre-existing references to pmsg still use implicit comparison, though not material to this PR, can you update them to compare with NULL. ------------- Marked as reviewed by rriggs (Reviewer). PR: https://git.openjdk.org/jdk/pull/12922 From pchilanomate at openjdk.org Mon Mar 13 15:49:24 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 13 Mar 2023 15:49:24 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 05:34:06 GMT, David Holmes wrote: > Looks good. Thanks. > Thanks for the review David! ------------- PR: https://git.openjdk.org/jdk/pull/12956 From djelinski at openjdk.org Mon Mar 13 15:55:27 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 13 Mar 2023 15:55:27 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v3] In-Reply-To: References: Message-ID: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. Daniel Jeli?ski has updated the pull request incrementally with one additional commit since the last revision: Use NULL where appropriate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12922/files - new: https://git.openjdk.org/jdk/pull/12922/files/ea91b651..efd72a1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12922&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12922&range=01-02 Stats: 12 lines in 1 file changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/12922.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12922/head:pull/12922 PR: https://git.openjdk.org/jdk/pull/12922 From djelinski at openjdk.org Mon Mar 13 16:01:05 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 13 Mar 2023 16:01:05 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v2] In-Reply-To: <88rPs699hRVHL_w1dIfbGgp5fhsoP9a4aTXqtuXZD8g=.1a441cba-0807-4de1-9174-92e2eeff3dad@github.com> References: <88rPs699hRVHL_w1dIfbGgp5fhsoP9a4aTXqtuXZD8g=.1a441cba-0807-4de1-9174-92e2eeff3dad@github.com> Message-ID: On Mon, 13 Mar 2023 15:05:04 GMT, Roger Riggs wrote: >> Daniel Jeli?ski has updated the pull request incrementally with three additional commits since the last revision: >> >> - Address review comments >> - Mention that the returned text is static and thread safe >> - Define buffer size > > src/java.base/share/native/libzip/zip_util.c line 812: > >> 810: >> 811: if (strlen(name) >= PATH_MAX) { >> 812: if (pmsg) { > > Some pre-existing references to pmsg still use implicit comparison, though not material to this PR, can you update them to compare with NULL. done. ------------- PR: https://git.openjdk.org/jdk/pull/12922 From fparain at openjdk.org Mon Mar 13 16:26:06 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 16:26:06 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: SA additional caching from Chris Plummer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/ce1180ef..322b626d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=01-02 Stats: 78 lines in 2 files changed: 35 ins; 34 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From naoto at openjdk.org Mon Mar 13 16:47:52 2023 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 13 Mar 2023 16:47:52 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v3] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 15:55:27 GMT, Daniel Jeli?ski wrote: >> This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. >> >> Other changes include: >> - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. >> - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. >> - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. >> - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. >> - `getLastErrorString` is no longer exported by libjava. >> >> Tier1-3 tests continue to pass. >> >> No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. >> Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. > > Daniel Jeli?ski has updated the pull request incrementally with one additional commit since the last revision: > > Use NULL where appropriate Marked as reviewed by naoto (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12922 From coleenp at openjdk.org Mon Mar 13 16:51:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 16:51:20 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 16:26:06 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > SA additional caching from Chris Plummer Most minor comments but one .inline.hpp still in an hpp file. I should point out that I only skimmed the SA and JVMCI changes. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2653: > 2651: } > 2652: InstanceKlass* iklass = InstanceKlass::cast(klass); > 2653: if (index < 0 ||index > iklass->total_fields_count()) { nit: need space after || src/hotspot/share/oops/fieldInfo.cpp line 45: > 43: } > 44: > 45: void FieldInfo::print_from_growable_array(GrowableArray* array, outputStream* os, ConstantPool* cp) { For consistency, can you make the outputStream parameter first? src/hotspot/share/oops/instanceKlass.hpp line 32: > 30: #include "oops/constMethod.hpp" > 31: #include "oops/constantPool.hpp" > 32: #include "oops/fieldInfo.inline.hpp" This shouldn't have an inline.hpp inclusion. src/hotspot/share/runtime/vmStructs.cpp line 2304: > 2302: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ > 2303: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ > 2304: declare_constant(FieldInfo::FieldFlags::_ff_contended) \ If there are flags that SA doesn't use, like contended, I don't think they should be included in the information that we pass to SA. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From coleenp at openjdk.org Mon Mar 13 16:51:25 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 16:51:25 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 20:49:04 GMT, Frederic Parain wrote: >> src/hotspot/share/classfile/classFileParser.cpp line 1491: >> >>> 1489: _temp_field_info = new GrowableArray(total_fields); >>> 1490: >>> 1491: ResourceMark rm(THREAD); >> >> Is the ResourceMark ok here or should it go before allocating _temp_field_info ? > > _temp_field_info must survive after ClassFileParser::parse_fields() has returned, so definitively after the allocation of _temp_field_info. That being said, I don't see any reason to have a ResourceMark here, probably a remain of some debugging/tracing code. I'll remove it. ok, good. The ResourceMark might be a problem with the GrowableArray if it grows. >> src/hotspot/share/classfile/classFileParser.cpp line 1608: >> >>> 1606: fflags.update_injected(true); >>> 1607: AccessFlags aflags; >>> 1608: FieldInfo fi(aflags, (u2)(injected[n].name_index), (u2)(injected[n].signature_index), 0, fflags); >> >> I don't know why there's a cast here until I read more. If the FieldInfo name_index and signature_index fields are only u2 sized, could you pass this as an int and then in the constructor assert that the value doesn't overflow u2 instead? > > The type of name_index and signature_index is const vmSymbolID, because they names and signatures of injected fields do not come from a constant pool, but from the vmSymbol array. ok the cast is fine here. >> src/hotspot/share/oops/fieldStreams.hpp line 104: >> >>> 102: AccessFlags flags; >>> 103: flags.set_flags(field()->access_flags()); >>> 104: return flags; >> >> Did this used to do this for a reason? > > Using the setter rather than the constructor filters out the VM defined flags and keeps only the flags from the class file. I see, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From coleenp at openjdk.org Mon Mar 13 16:51:29 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 16:51:29 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 15:53:03 GMT, Coleen Phillimore wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> SA additional caching from Chris Plummer > > src/hotspot/share/classfile/classFileParser.cpp line 1634: > >> 1632: for(int i = 0; i < _temp_field_info->length(); i++) { >> 1633: name = _temp_field_info->adr_at(i)->name(_cp); >> 1634: sig = _temp_field_info->adr_at(i)->signature(_cp); > > This checking for duplicates looks like a good candidate for a separate function because parse_fields is so long. I'm adding this comment to remember to file an RFE to look into making this function shorter and factor out this code. Filed a cleanup RFE https://bugs.openjdk.org/browse/JDK-8304069 ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Mon Mar 13 17:32:35 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 17:32:35 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: <-zydTVMcZURxcBOz8Rj5Gi0DUtqkANHuYnSUStzY0dY=.2c84dc16-4d7a-44fc-a27c-0ffb9f56bec8@github.com> On Mon, 13 Mar 2023 16:41:05 GMT, Coleen Phillimore wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> SA additional caching from Chris Plummer > > src/hotspot/share/runtime/vmStructs.cpp line 2304: > >> 2302: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ >> 2303: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ >> 2304: declare_constant(FieldInfo::FieldFlags::_ff_contended) \ > > If there are flags that SA doesn't use, like contended, I don't think they should be included in the information that we pass to SA. The contended flag is required to be able to decode the compressed stream, because it signals the presence of an optional part of a field description. The only flags that was not required for the decoding of the stream and was not used by the SA was Stable, and I'll remove it in the next commit. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From cjplummer at openjdk.org Mon Mar 13 18:02:28 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 13 Mar 2023 18:02:28 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: <-zydTVMcZURxcBOz8Rj5Gi0DUtqkANHuYnSUStzY0dY=.2c84dc16-4d7a-44fc-a27c-0ffb9f56bec8@github.com> References: <-zydTVMcZURxcBOz8Rj5Gi0DUtqkANHuYnSUStzY0dY=.2c84dc16-4d7a-44fc-a27c-0ffb9f56bec8@github.com> Message-ID: On Mon, 13 Mar 2023 17:29:28 GMT, Frederic Parain wrote: >> src/hotspot/share/runtime/vmStructs.cpp line 2304: >> >>> 2302: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ >>> 2303: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ >>> 2304: declare_constant(FieldInfo::FieldFlags::_ff_contended) \ >> >> If there are flags that SA doesn't use, like contended, I don't think they should be included in the information that we pass to SA. > > The contended flag is required to be able to decode the compressed stream, because it signals the presence of an optional part of a field description. The only flag that was not required for the decoding of the stream and was not used by the SA was Stable, and I'll remove it in the next commit. Leaving it in allows the field to be displayed if an SA user ever dumps a FieldFlags object. Generally speaking it is good to keep these structs complete, or at least complete with any info that might be useful when debugging with SA. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From rkennke at openjdk.org Mon Mar 13 18:43:41 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 13 Mar 2023 18:43:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v22] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Re-design LockStack for faster lock-stack depth-check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/5fe2afcf..0b7be891 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=20-21 Stats: 251 lines in 34 files changed: 28 ins; 144 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From fparain at openjdk.org Mon Mar 13 18:51:17 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 18:51:17 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: Fixes includes and style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/322b626d..12b4f1b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=02-03 Stats: 9 lines in 5 files changed: 3 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Mon Mar 13 18:51:20 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 18:51:20 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: <-zydTVMcZURxcBOz8Rj5Gi0DUtqkANHuYnSUStzY0dY=.2c84dc16-4d7a-44fc-a27c-0ffb9f56bec8@github.com> Message-ID: On Mon, 13 Mar 2023 17:59:03 GMT, Chris Plummer wrote: >> The contended flag is required to be able to decode the compressed stream, because it signals the presence of an optional part of a field description. The only flag that was not required for the decoding of the stream and was not used by the SA was Stable, and I'll remove it in the next commit. > > Leaving it in allows the field to be displayed if an SA user ever dumps a FieldFlags object. Generally speaking it is good to keep these structs complete, or at least complete with any info that might be useful when debugging with SA. The "stable" flag and the related methods have been preserved in the last commit. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From rkennke at openjdk.org Mon Mar 13 20:02:45 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 13 Mar 2023 20:02:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v23] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: X86 parts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/0b7be891..75db4f0a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=21-22 Stats: 143 lines in 14 files changed: 0 ins; 124 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From kevinw at openjdk.org Mon Mar 13 20:13:15 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 13 Mar 2023 20:13:15 GMT Subject: RFR: 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. [v2] In-Reply-To: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> References: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> Message-ID: On Fri, 3 Mar 2023 11:46:49 GMT, Kevin Walls wrote: >> Deprecate the Java Management Extension (JMX) Subject Delegation feature for removal in a future release. >> >> Given no known usage, there is no replacement feature for JMX Subject Delegation. >> >> CSR is https://bugs.openjdk.org/browse/JDK-8298967 > > Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - deprecation text update > - Revert "RMIConnection throw comments" > > This reverts commit aceb4fe44189245ac702f0c74c2bb1100a6d17fa. > - Merge remote-tracking branch 'upstream/master' into Deprecate_SubjectDelegation > - RMIConnection throw comments > - 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. CSR approved. ------------- PR: https://git.openjdk.org/jdk/pull/11880 From pchilanomate at openjdk.org Mon Mar 13 20:18:03 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 13 Mar 2023 20:18:03 GMT Subject: Integrated: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:55:06 GMT, Patricio Chilano Mateo wrote: > Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". > I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. > > Thanks, > Patricio This pull request has now been integrated. Changeset: a8f662ec Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/a8f662ecb2cf13ba7fa499b9a9150da4318306a8 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode Reviewed-by: sspitsyn, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/12956 From coleenp at openjdk.org Mon Mar 13 20:42:29 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 20:42:29 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: <3YhDOFYnbJ4QsrXUQbUQfFbXHb75eK1Mowuv9yYaXqE=.62fc2fd0-5384-440d-919a-1c59e9b2f3fb@github.com> On Mon, 13 Mar 2023 18:51:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fixes includes and style All my comments are addressed. Thank you! This is significant work. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From coleenp at openjdk.org Mon Mar 13 21:07:10 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 21:07:10 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: <33uTpOt8ALbYOl5axezzxriVn4V1h860H3YWEbJ-PDY=.429dbaaa-e75e-47fb-88aa-3bd451ee4662@github.com> On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments src/hotspot/cpu/x86/interp_masm_x86.cpp line 2075: > 2073: movptr(cache, Address(rbp, frame::interpreter_frame_cache_offset * wordSize)); > 2074: movptr(cache, Address(cache, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); > 2075: if (is_power_of_2(sizeof(ResolvedIndyEntry))) { This was a good suggestion but I wonder if we should assert ResolvedIndyEntry is a power of 2 so we know if we change the size and make it go the slower path? Or is 32 bit not a power of two and we need this? ------------- PR: https://git.openjdk.org/jdk/pull/12778 From mchung at openjdk.org Mon Mar 13 21:10:36 2023 From: mchung at openjdk.org (Mandy Chung) Date: Mon, 13 Mar 2023 21:10:36 GMT Subject: RFR: 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. [v2] In-Reply-To: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> References: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> Message-ID: On Fri, 3 Mar 2023 11:46:49 GMT, Kevin Walls wrote: >> Deprecate the Java Management Extension (JMX) Subject Delegation feature for removal in a future release. >> >> Given no known usage, there is no replacement feature for JMX Subject Delegation. >> >> CSR is https://bugs.openjdk.org/browse/JDK-8298967 > > Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - deprecation text update > - Revert "RMIConnection throw comments" > > This reverts commit aceb4fe44189245ac702f0c74c2bb1100a6d17fa. > - Merge remote-tracking branch 'upstream/master' into Deprecate_SubjectDelegation > - RMIConnection throw comments > - 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. Marked as reviewed by mchung (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11880 From coleenp at openjdk.org Mon Mar 13 21:17:06 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 21:17:06 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: <3QfQXArLyZzTcdg4r9bSGJKmnoG_YY8OFOJ0eLz2rYY=.e83d9ff2-8c7e-471c-b250-97a92e7db1e5@github.com> On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments I have a couple of very minor comments. This change is great. Thank you! src/hotspot/cpu/x86/templateTable_x86.cpp line 2798: > 2796: bool is_invokevirtual, > 2797: bool is_invokevfinal, /*unused*/ > 2798: bool is_invokedynamic /*unused*/) { Can you remove the parameter since the s390 port is here? src/hotspot/share/oops/resolvedIndyEntry.hpp line 112: > 110: set_flags(has_appendix); > 111: // Set the method last since it is read lock free. > 112: // Resolution is indicated by whether or not he method is set. typo: he -> the ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Mon Mar 13 21:26:06 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 13 Mar 2023 21:26:06 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <33uTpOt8ALbYOl5axezzxriVn4V1h860H3YWEbJ-PDY=.429dbaaa-e75e-47fb-88aa-3bd451ee4662@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> <33uTpOt8ALbYOl5axezzxriVn4V1h860H3YWEbJ-PDY=.429dbaaa-e75e-47fb-88aa-3bd451ee4662@github.com> Message-ID: On Mon, 13 Mar 2023 21:04:22 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Interpreter optimization and comments > > src/hotspot/cpu/x86/interp_masm_x86.cpp line 2075: > >> 2073: movptr(cache, Address(rbp, frame::interpreter_frame_cache_offset * wordSize)); >> 2074: movptr(cache, Address(cache, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); >> 2075: if (is_power_of_2(sizeof(ResolvedIndyEntry))) { > > This was a good suggestion but I wonder if we should assert ResolvedIndyEntry is a power of 2 so we know if we change the size and make it go the slower path? Or is 32 bit not a power of two and we need this? Currently the structure is a power of two on 64 bits but this is not the case on 32 bit systems. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dnsimon at openjdk.org Mon Mar 13 21:57:43 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 Mar 2023 21:57:43 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> On Mon, 13 Mar 2023 18:51:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fixes includes and style src/hotspot/share/jvmci/jvmciEnv.cpp line 1439: > 1437: JNIAccessMark jni(this, THREAD); > 1438: jobject result = jni()->NewObject(JNIJVMCI::FieldInfo::clazz(), > 1439: JNIJVMCI::VMFlag::constructor(), `JNIJVMCI::VMFlag::constructor()` is the wrong constructor. src/hotspot/share/jvmci/jvmciEnv.hpp line 149: > 147: }; > 148: > 149: extern JNIEXPORT jobjectArray c2v_getDeclaredFieldsInfo(JNIEnv* env, jobject, jobject, jlong); What's the purpose of this declaration? I don't think you need it or the `friend` declaration below since `new_FieldInfo(FieldInfo* fieldinfo, JVMCI_TRAPS)` is public. src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 416: > 414: declare_constant(FieldInfo::FieldFlags::_ff_injected) \ > 415: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ > 416: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ I don't think `_ff_generic` is used in the JVMCI Java code so this entry can be deleted. Please double check the other entries. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 814: > 812: HotSpotResolvedObjectTypeImpl resolvedHolder; > 813: try { > 814: resolvedHolder = compilerToVM().resolveFieldInPool(this, index, (HotSpotResolvedJavaMethodImpl) method, (byte) opcode, info); Please update the javadoc for `CompilerToVM.resolveFieldInPool` to reflect the expanded definition of `info`. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java line 88: > 86: > 87: /** > 88: * Lazily initialized cache for FieldInfo nit: missing `.` at end of sentence src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaField.java line 48: > 46: * Returns VM internal flags associated with this field > 47: */ > 48: int getInternalModifiers(); We've never exposed the internal modifiers before in public JVMCI API and we should refrain from doing so until there's a good reason to do so. Please remove this method. test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaField.java line 97: > 95: > 96: @Test > 97: public void getInternalModifiersTest() { No need for this test since the `getInternalModifiers` method should be removed. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From dnsimon at openjdk.org Mon Mar 13 22:06:46 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 Mar 2023 22:06:46 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments As communicated to Matias earlier via email, the JVMCI changes in this PR look fine. ------------- Marked as reviewed by dnsimon (Committer). PR: https://git.openjdk.org/jdk/pull/12778 From coleenp at openjdk.org Mon Mar 13 22:06:47 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 22:06:47 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments @dougxc Can you have a look at the jvmci changes in this PR also? ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dcubed at openjdk.org Mon Mar 13 23:01:08 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 Mar 2023 23:01:08 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v23] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 20:02:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > X86 parts Reviewed v21 changes except for riscv. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Mon Mar 13 23:01:17 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 Mar 2023 23:01:17 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v22] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 18:43:41 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Re-design LockStack for faster lock-stack depth-check src/hotspot/share/runtime/lockStack.cpp line 35: > 33: int LockStack::end_offset() { > 34: return in_bytes(JavaThread::lock_stack_base_offset()) + CAPACITY * oopSize; > 35: } It's a little odd to have function `end_offset()` be defined before the `LockStack::LockStack()` constructor. src/hotspot/share/runtime/lockStack.hpp line 51: > 49: public: > 50: static ByteSize offset_offset() { return byte_offset_of(LockStack, _offset); } > 51: static ByteSize base_offset() { return byte_offset_of(LockStack, _base); } nit - you might want to align these '{' indents. src/hotspot/share/runtime/lockStack.inline.hpp line 84: > 82: if (_base[i] == o) { > 83: validate("post-contains"); > 84: validate("post-contains"); Two `validate()` calls in a row. Probably a cut-n-paste error. src/hotspot/share/runtime/synchronizer.cpp line 514: > 512: } > 513: } > 514: } else { Consider adding a comment after L513 and before L514: ` // No room on the lock_stack so fall-through to inflate-enter.` src/hotspot/share/runtime/vmStructs.cpp line 704: > 702: nonstatic_field(JavaThread, _lock_stack, LockStack) \ > 703: nonstatic_field(LockStack, _offset, int) \ > 704: nonstatic_field(LockStack, _base[0], oop) \ It surprises me that you can specify `_base[0]` here. nit: the indent before the backslash is off. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Mon Mar 13 23:13:56 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 Mar 2023 23:13:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v23] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 20:02:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > X86 parts Also reviewed v22 changes; no comments on those. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From fparain at openjdk.org Mon Mar 13 23:28:41 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 23:28:41 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments Marked as reviewed by fparain (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/12778 From cjplummer at openjdk.org Tue Mar 14 01:28:57 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 14 Mar 2023 01:28:57 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 18:51:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fixes includes and style Changes requested by cjplummer (Reviewer). src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 75: > 73: int initialValueIndex; > 74: int genericSignatureIndex; > 75: int contendedGroup; It seems that these should all be shorts. All the getter methods are casting them to short. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 99: > 97: if (fieldIsInitialized(fieldInfoValues.fieldFlags)) fieldInfoValues.initialValueIndex = crs.readInt(); // read initial value index > 98: if (fieldIsGeneric(fieldInfoValues.fieldFlags)) fieldInfoValues.genericSignatureIndex = crs.readInt(); // read generic signature index > 99: if (fieldIsContended(fieldInfoValues.fieldFlags)) fieldInfoValues.contendedGroup = crs.readInt(); // read contended group Column with is too wide. These lines would be easier to read if you made each one multiple lines with curly braces. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 107: > 105: int javafieldsCount = crs.readInt(); // read num_java_fields > 106: int VMFieldsCount = crs.readInt(); // read num_injected_fields; > 107: int numFields = javafieldsCount + VMFieldsCount; VMFieldsCount -> vmFieldsCount, or maybe just use num_java_fields and num_injected_fields src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 285: > 283: public short getFieldNameIndex(int index) { > 284: if (index >= getJavaFieldsCount()) throw new IndexOutOfBoundsException("not a Java field;"); > 285: return (short)getField(index).getNameIndex(); Cast to short not needed src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 303: > 301: public short getFieldSignatureIndex(int index) { > 302: if (index >= getJavaFieldsCount()) throw new IndexOutOfBoundsException("not a Java field;"); > 303: return (short)getField(index).getGenericSignatureIndex(); Cast to short is not needed src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 321: > 319: public short getFieldInitialValueIndex(int index) { > 320: if (index >= getJavaFieldsCount()) throw new IndexOutOfBoundsException("not a Java field;"); > 321: return (short)getField(index).getInitialValueIndex(); cast to short is not needed src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 325: > 323: > 324: public int getFieldOffset(int index) { > 325: return (int)getField(index).getOffset(); Cast to int is not needed ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fyang at openjdk.org Tue Mar 14 01:44:03 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Mar 2023 01:44:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v22] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 18:43:41 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Re-design LockStack for faster lock-stack depth-check src/hotspot/cpu/aarch64/c2_CodeStubs_aarch64.cpp line 87: > 85: __ sub(t, t, oopSize); > 86: __ str(t, Address(rthread, JavaThread::lock_stack_offset_offset())); > 87: It looks to me that the '_offset' of LockStack should be updated with ldrw, subw and strw instructions here. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Tue Mar 14 02:03:32 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 14 Mar 2023 02:03:32 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 18:51:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fixes includes and style src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 108: > 106: CLASS_STATE_INITIALIZATION_ERROR = db.lookupIntConstant("InstanceKlass::initialization_error").intValue(); > 107: // We need a new fieldsCache each time we attach. > 108: fieldsCache = new HashMap(); This should probably be a WeakHashMap. I tried it and it seems to work (or at least didn't cause any problems). However, when doing a heap dump I didn't notice the table being any smaller on exit when it was made weak, even though there were numerous GC's while dumping the heap. The is the Address of the hotspot InstanceKlass instance, and this Address is referenced by the SA InstanceKlass mirror. So theoretically when the reference to the mirror goes way, then the cache entry can be cleared. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From duke at openjdk.org Tue Mar 14 02:52:11 2023 From: duke at openjdk.org (liach) Date: Tue, 14 Mar 2023 02:52:11 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API Message-ID: Summaries: 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. 3. Most tests are included in tier1, but some are not: In `:jdk_io`: (tier2, part 2) test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java test/jdk/java/io/Serializable/records/ProhibitedMethods.java test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java In `:jdk_instrument`: (tier 3) test/jdk/java/lang/instrument/RetransformAgent.java test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java test/jdk/java/lang/instrument/asmlib/Instrumentor.java @asotona Would you mind reviewing? ------------- Commit messages: - Convert test/jdk/java ASM tests to classfile api Changes: https://git.openjdk.org/jdk/pull/13009/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294977 Stats: 1913 lines in 31 files changed: 283 ins; 888 del; 742 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009 From lmesnik at openjdk.org Tue Mar 14 03:29:32 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 14 Mar 2023 03:29:32 GMT Subject: RFR: 8303705: Field sleeper.started should be volatile JdbLockTestTarg.java Message-ID: Field has been made volatile. ------------- Commit messages: - Variable is made volatile. Changes: https://git.openjdk.org/jdk/pull/13010/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13010&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303705 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13010.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13010/head:pull/13010 PR: https://git.openjdk.org/jdk/pull/13010 From dholmes at openjdk.org Tue Mar 14 05:18:26 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 05:18:26 GMT Subject: RFR: 8303705: Field sleeper.started should be volatile JdbLockTestTarg.java In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 03:23:05 GMT, Leonid Mesnik wrote: > Field has been made volatile. Fine and trivial. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/13010 From dholmes at openjdk.org Tue Mar 14 05:53:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 05:53:28 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: <5bzaYlM6HXfUNJITjTSIaGgcJ_51OQf6XWr07w__wUw=.d0a9ac8b-a9a1-4122-9d2f-880de717d071@github.com> References: <5bzaYlM6HXfUNJITjTSIaGgcJ_51OQf6XWr07w__wUw=.d0a9ac8b-a9a1-4122-9d2f-880de717d071@github.com> Message-ID: <9WvL-zpi-ekddOKD2iqHtRAmqFJwwtL1gwKxnsLtA7A=.60564e75-e261-4ed0-a89f-8179c7ffdaa5@github.com> On Thu, 9 Mar 2023 09:29:41 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/runtime/threads.cpp line 338: >> >>> 336: if (EagerXrunInit && Arguments::init_libraries_at_startup()) { >>> 337: create_vm_init_libraries(); >>> 338: } >> >> Not obvious where this went. Changes to the initialization order can be very problematic. > > Thanks, David. Two calls to launch XRun agents are invoked during startup, and they depend on the EagerXrunInit option. The !EagerXrunInit case is already located in create_vm(), but the EagerXrunInit was located as the first entry in initialize_java_lang_classes(), which I thought was tucked away a bit unnecessarily. > > I hoisted the EagerXrunInit case from initialize_java_lang_classes() up to create_vm(). It's now the call just before initialize_java_lang_classes(). > > This made it clearer, i.e. to have both calls located directly in create_vm(). Thanks for clarifying. That makes sense. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From dholmes at openjdk.org Tue Mar 14 06:03:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 06:03:37 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 10:43:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup I've had a good look through now and have a better sense of the refactoring. Seems good. I'll wait for any tweaks before hitting the approve button though. Thanks ------------- PR: https://git.openjdk.org/jdk/pull/12923 From alanb at openjdk.org Tue Mar 14 08:00:09 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 14 Mar 2023 08:00:09 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 02:43:41 GMT, liach wrote: > Summaries: > 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Good to see these tests converted, just a few nits about trying to keep the code/style consistent with the existing code/style where possible. test/jdk/java/lang/ModuleTests/AnnotationsTest.java line 146: > 144: */ > 145: static byte[] addDeprecated(byte[] bytes, boolean forRemoval, String since) { > 146: return Classfile.parse(bytes).transform(ClassTransform.ACCEPT_ALL.andThen(ClassTransform.endHandler(clb -> { The conversion of this test okay but would be good if you split up the overly long lines as they are inconsistent with everything else in this test and makes it annoying to look at the changes side-by-side. test/jdk/java/lang/invoke/defineHiddenClass/BasicTest.java line 282: > 280: > 281: assertTrue(hc.isHidden()); > 282: assertEquals(hc.getModifiers(), accessFlags.stream().mapToInt(AccessFlag::mask).reduce(AccessFlag.PUBLIC.mask(), (a, b) -> a | b)); Do you mind splitting this line too, it's 140+ characters long so impossible to look at the changes side-by-side. test/jdk/java/util/ServiceLoader/BadProvidersTest.java line 216: > 214: clb.withSuperclass(CD_Object); > 215: clb.withFlags(AccessFlag.PUBLIC, AccessFlag.SUPER); > 216: var provider$1Desc = ClassDesc.of("p", "ProviderFactory$1"); This is class descriptor for ProviderFactory$1, not "Provider" so maybe rename this to providerFactory1 or something a bit clearer. ------------- PR: https://git.openjdk.org/jdk/pull/13009 From rrich at openjdk.org Tue Mar 14 09:24:08 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 14 Mar 2023 09:24:08 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments @matias9927 can I ask you to merge master? There seem to be conflicts (at least I see a message "This branch has conflicts that must be resolved"). I'd like to give the change a spin in our CI testing. This requires that it can be applied on master. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rkennke at openjdk.org Tue Mar 14 10:19:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 10:19:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v23] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 20:02:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > X86 parts Thank you all for your review comments! I will address them today. Yesterday I pushed a rather significant change that probably warrants some explanation. Previously, the lock-stack could be grown when its capacity is no longer sufficient. However, that means that we needed to maintain 3 pointers: the stack base, the current stack-pointer and the limit. Also, checking for room on the lock-stack involved loading 2 of these two pointers (current and limit) and comparing them. This used to be tricky because it requires two registers on some platforms. The insight that leads to the improved implementation is that the lock-stack is very commonly very shallow: I did an experiment with several workloads yesterday and it never exceeded a depth of 5. I now made the lock-stack size 8 elements and fixed-size. When the lock-stack ever is full, then we have to bite the bullet and inflate the monitor, but this should be very very rare. On the upside, the check for lock stack is now much simpler: we only need to load the current stack offset and compare it to the end offset - which is a constant and can be encoded as immediate. Also, the current 'pointer' is now an offset relative to the beginning of the JavaThread structure, this way the lock-stack can be addressed using indirect addressing on rthread. Additionally, I eliminated the code that checks for enough lock-stack upon method entry. This has not been very useful and often lead to excessive lock-stack-growth. @RealFYang You may want to update the RISCV code to reflect those latest changes, otherwise it would now be broken. I will now address your comments and also change the implementation of SA. Thanks, Roman ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 14 10:49:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 10:49:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v24] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use -w instructions in new locking code stubs (aarch64) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/75db4f0a..87b95bf7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=22-23 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From mgronlun at openjdk.org Tue Mar 14 12:25:54 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:25:54 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: <5uEVRUEr0vSBiTHqKpKVwG1k-v5UrFr9RVAip3K8NSg=.a7bf35b5-b372-4ba6-b217-642c6ad4e2a8@github.com> On Mon, 13 Mar 2023 06:22:21 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> more cleanup > > src/hotspot/share/prims/agent.cpp line 34: > >> 32: } >> 33: >> 34: static const char* allocate_copy(const char* str) { > > Why not just use `os::strdup`? Better alternative, thanks David. > src/hotspot/share/prims/agentList.cpp line 227: > >> 225: * store data in their JvmtiEnv local storage. >> 226: * >> 227: * Please see JPLISAgent.c in module java.instrument, see JPLISAgent.h and JPLISAgent.c. > > No need to mention the .c file twice. Good point. > src/hotspot/share/prims/agentList.cpp line 419: > >> 417: const jint err = (*on_load_entry)(&main_vm, const_cast(agent->options()), NULL); >> 418: if (err != JNI_OK) { >> 419: vm_exit_during_initialization("-Xrun library failed to init", agent->name()); > > Do you need to be back in `_thread_in_vm` before exiting? Hmm. This was ported as is. I will double-check. > src/hotspot/share/prims/agentList.cpp line 542: > >> 540: >> 541: // Invoke the Agent_OnAttach function >> 542: JavaThread* THREAD = JavaThread::current(); // For exception macros. > > Nit: just use `current` rather than `THREAD` and don't use the exception macros. Ported as is but good point, will update. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Tue Mar 14 12:26:00 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:26:00 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v9] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 06:57:46 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> handle multiple envs with same VMInit callback > > src/hotspot/share/prims/agent.cpp line 41: > >> 39: char* copy = AllocateHeap(length + 1, mtInternal); >> 40: strncpy(copy, str, length + 1); >> 41: assert(strncmp(copy, str, length + 1) == 0, "invariant"); > > Unclear what you are checking here. Don't you trust strncpy? Maybe a bit paranoid, yes. I can clean up. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Tue Mar 14 12:26:03 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:26:03 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: <-OvxuwPKbYU514MyCXdcC5-0Nt1ftlipuUFueCe3DGc=.3b0140b3-ef41-4ad6-9515-ec6e9ef40250@github.com> On Mon, 13 Mar 2023 09:49:39 GMT, Andrew Dinn wrote: >> src/hotspot/share/prims/agentList.cpp line 64: >> >>> 62: void AgentList::add_xrun(const char* name, char* options, bool absolute_path) { >>> 63: Agent* agent = new Agent(name, options, absolute_path); >>> 64: agent->_is_xrun = true; >> >> Why direct access of private field instead of having a setter like other parts of the Agent API? > > n.b. that also applies for accesses/updates to field _next. I wanted all accesses to use the iterator. The only access is given to the iterator and AgentList by way of being friends. No need to expose more. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Tue Mar 14 12:29:10 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:29:10 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 06:01:05 GMT, David Holmes wrote: > I've had a good look through now and have a better sense of the refactoring. Seems good. > > > > I'll wait for any tweaks before hitting the approve button though. > > > > Thanks Thanks so much for taking a look. I realized that implementation details of loading should probably reside in agent.cpp, not agentList.cpp. I am currently off on vacation and will update when back. Thanks also to Andrew Dinn for comments. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Tue Mar 14 12:29:14 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:29:14 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 09:46:04 GMT, Andrew Dinn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> more cleanup > > src/hotspot/share/jfr/metadata/metadata.xml line 1182: > >> 1180: >> 1181: >> 1182: > > @mgronlun A somewhat drive-by comment. It might be clearer if you renamed these event fields and accessors, plus also the corresponding fields and accessors in class Agent, as `initializationTime` and `initializationDuration`. Makes sense. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From fparain at openjdk.org Tue Mar 14 13:15:54 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 13:15:54 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: On Mon, 13 Mar 2023 21:53:37 GMT, Doug Simon wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes includes and style > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaField.java line 48: > >> 46: * Returns VM internal flags associated with this field >> 47: */ >> 48: int getInternalModifiers(); > > We've never exposed the internal modifiers before in public JVMCI API and we should refrain from doing so until there's a good reason to do so. Please remove this method. Access to internal modifiers is needed in `HotSpotResolvedJavaFieldTest.testEquivalenceForInternalFields()`. I moved the declaration of the method to `HotSpotResolvedJavaField`. Does this change work for you? ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Tue Mar 14 13:19:37 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 13:19:37 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: On Mon, 13 Mar 2023 21:44:59 GMT, Doug Simon wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes includes and style > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 416: > >> 414: declare_constant(FieldInfo::FieldFlags::_ff_injected) \ >> 415: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ >> 416: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ > > I don't think `_ff_generic` is used in the JVMCI Java code so this entry can be deleted. Please double check the other entries. _ff_generic removed. _ff_stable is used in `HotSpotResolvedJavaFieldImpl.isStable()`. _ff_injected is used in `HotSpotResolvedJavaFieldImpl.isInternal()` ------------- PR: https://git.openjdk.org/jdk/pull/12855 From coleenp at openjdk.org Tue Mar 14 13:21:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 14 Mar 2023 13:21:55 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <3QfQXArLyZzTcdg4r9bSGJKmnoG_YY8OFOJ0eLz2rYY=.e83d9ff2-8c7e-471c-b250-97a92e7db1e5@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> <3QfQXArLyZzTcdg4r9bSGJKmnoG_YY8OFOJ0eLz2rYY=.e83d9ff2-8c7e-471c-b250-97a92e7db1e5@github.com> Message-ID: On Mon, 13 Mar 2023 21:05:11 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Interpreter optimization and comments > > src/hotspot/cpu/x86/templateTable_x86.cpp line 2798: > >> 2796: bool is_invokevirtual, >> 2797: bool is_invokevfinal, /*unused*/ >> 2798: bool is_invokedynamic /*unused*/) { > > Can you remove the parameter since the s390 port is here? Ok, never mind, I saw s390 port but it doesn't seem to be in these changes (?) ------------- PR: https://git.openjdk.org/jdk/pull/12778 From fparain at openjdk.org Tue Mar 14 13:40:55 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 13:40:55 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: On Mon, 13 Mar 2023 21:35:17 GMT, Doug Simon wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes includes and style > > src/hotspot/share/jvmci/jvmciEnv.hpp line 149: > >> 147: }; >> 148: >> 149: extern JNIEXPORT jobjectArray c2v_getDeclaredFieldsInfo(JNIEnv* env, jobject, jobject, jlong); > > What's the purpose of this declaration? I don't think you need it or the `friend` declaration below since `new_FieldInfo(FieldInfo* fieldinfo, JVMCI_TRAPS)` is public. Without this declaration, builds fail on Windows with this error: `error C2375: 'c2v_getDeclaredFieldsInfo': redefinition; different linkage` ------------- PR: https://git.openjdk.org/jdk/pull/12855 From matsaave at openjdk.org Tue Mar 14 13:59:48 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 14 Mar 2023 13:59:48 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Typo in comment - Merge branch 'master' into resolvedIndyEntry_8301995 - Interpreter optimization and comments - PPC and RISCV port - 8301995: Move invokedynamic resolution information out of the cpCache ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/c2d87e59..a3e7ca02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=01-02 Stats: 92608 lines in 1481 files changed: 72908 ins; 8825 del; 10875 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Tue Mar 14 14:54:06 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 14 Mar 2023 14:54:06 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> <3QfQXArLyZzTcdg4r9bSGJKmnoG_YY8OFOJ0eLz2rYY=.e83d9ff2-8c7e-471c-b250-97a92e7db1e5@github.com> Message-ID: <6ungKcriyVh3xBdyFAA7AwOHgNMAO8E1fWeGi1Ap3gA=.fc8c1ee2-66a4-4324-be69-f186360efb5a@github.com> On Tue, 14 Mar 2023 13:18:40 GMT, Coleen Phillimore wrote: > Ok, never mind, I saw s390 port but it doesn't seem to be in these changes (?) It is not in these changes. @offamitkumar is working on s390x. It is not yet finished though. (I wasn't aware that putting the URL of this PR into a comment somewhere else adds a comment to this PR) ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dfuchs at openjdk.org Tue Mar 14 15:07:23 2023 From: dfuchs at openjdk.org (Daniel Fuchs) Date: Tue, 14 Mar 2023 15:07:23 GMT Subject: RFR: 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. [v2] In-Reply-To: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> References: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> Message-ID: On Fri, 3 Mar 2023 11:46:49 GMT, Kevin Walls wrote: >> Deprecate the Java Management Extension (JMX) Subject Delegation feature for removal in a future release. >> >> Given no known usage, there is no replacement feature for JMX Subject Delegation. >> >> CSR is https://bugs.openjdk.org/browse/JDK-8298967 > > Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - deprecation text update > - Revert "RMIConnection throw comments" > > This reverts commit aceb4fe44189245ac702f0c74c2bb1100a6d17fa. > - Merge remote-tracking branch 'upstream/master' into Deprecate_SubjectDelegation > - RMIConnection throw comments > - 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. Marked as reviewed by dfuchs (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11880 From dnsimon at openjdk.org Tue Mar 14 15:13:00 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 14 Mar 2023 15:13:00 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: <23n-dTVRiGuVl7imPvKph7q43FuB1k7Hak6-mGNDKeM=.40ca325c-e53f-4950-bece-99b7e4f4d367@github.com> On Tue, 14 Mar 2023 13:12:31 GMT, Frederic Parain wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaField.java line 48: >> >>> 46: * Returns VM internal flags associated with this field >>> 47: */ >>> 48: int getInternalModifiers(); >> >> We've never exposed the internal modifiers before in public JVMCI API and we should refrain from doing so until there's a good reason to do so. Please remove this method. > > Access to internal modifiers is needed in `HotSpotResolvedJavaFieldTest.testEquivalenceForInternalFields()`. I moved the declaration of the method to `HotSpotResolvedJavaField`. Does this change work for you? Just use reflection to read the internal flags (like this test already does for the `index` field). I've attached [review.patch](https://github.com/openjdk/jdk/files/10970245/review.patch) with this change and a few other changes I think should be made for better naming (plus one test cleanup). ------------- PR: https://git.openjdk.org/jdk/pull/12855 From rkennke at openjdk.org Tue Mar 14 15:47:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 15:47:29 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: Message-ID: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: SA fixes related to latest changes in LockStack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/87b95bf7..2f572097 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=23-24 Stats: 15 lines in 1 file changed: 4 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From gcao at openjdk.org Tue Mar 14 16:18:21 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 14 Mar 2023 16:18:21 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 13:59:48 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Typo in comment > - Merge branch 'master' into resolvedIndyEntry_8301995 > - Interpreter optimization and comments > - PPC and RISCV port > - 8301995: Move invokedynamic resolution information out of the cpCache Hi, I have updated the riscv related code by referring to the latest aarch64 related changes, please help me to update it. https://github.com/zifeihan/jdk/commit/ca9f110ca4eb066f828442265f43ed0d9311a9cc (on this branch: https://github.com/zifeihan/jdk/commits/follow_12778) @RealFYang @DingliZhang Please help review the RISCV port code. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From kevinw at openjdk.org Tue Mar 14 17:00:01 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 14 Mar 2023 17:00:01 GMT Subject: RFR: 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. [v2] In-Reply-To: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> References: <8_8cW07IV0FyDZqEqhWYssSJ9BGKofanzIPRWFZJ4BM=.c3b0b7e7-fbd9-4b04-af2f-3ad1b929eb6c@github.com> Message-ID: On Fri, 3 Mar 2023 11:46:49 GMT, Kevin Walls wrote: >> Deprecate the Java Management Extension (JMX) Subject Delegation feature for removal in a future release. >> >> Given no known usage, there is no replacement feature for JMX Subject Delegation. >> >> CSR is https://bugs.openjdk.org/browse/JDK-8298967 > > Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - deprecation text update > - Revert "RMIConnection throw comments" > > This reverts commit aceb4fe44189245ac702f0c74c2bb1100a6d17fa. > - Merge remote-tracking branch 'upstream/master' into Deprecate_SubjectDelegation > - RMIConnection throw comments > - 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. Thanks Mandy and Daniel! ------------- PR: https://git.openjdk.org/jdk/pull/11880 From kevinw at openjdk.org Tue Mar 14 17:02:39 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 14 Mar 2023 17:02:39 GMT Subject: Integrated: 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. In-Reply-To: References: Message-ID: On Fri, 6 Jan 2023 12:02:37 GMT, Kevin Walls wrote: > Deprecate the Java Management Extension (JMX) Subject Delegation feature for removal in a future release. > > Given no known usage, there is no replacement feature for JMX Subject Delegation. > > CSR is https://bugs.openjdk.org/browse/JDK-8298967 This pull request has now been integrated. Changeset: 4e631fa4 Author: Kevin Walls URL: https://git.openjdk.org/jdk/commit/4e631fa43fd821846c12ae2177360c44cf770766 Stats: 9 lines in 2 files changed: 7 ins; 0 del; 2 mod 8298966: Deprecate JMX Subject Delegation and the method JMXConnector.getMBeanServerConnection(Subject) for removal. Reviewed-by: mchung, dfuchs ------------- PR: https://git.openjdk.org/jdk/pull/11880 From matsaave at openjdk.org Tue Mar 14 17:04:48 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 14 Mar 2023 17:04:48 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Tue, 14 Mar 2023 09:20:54 GMT, Richard Reingruber wrote: > @matias9927 can I ask you to merge master? There seem to be conflicts (at least I see a message "This branch has conflicts that must be resolved"). I'd like to give the change a spin in our CI testing. This requires that it can be applied on master. I saw that merge error but nothing came up when I tried to merge locally. The branch is updated nonetheless, so you should be able to test it now @reinrich ! ------------- PR: https://git.openjdk.org/jdk/pull/12778 From djelinski at openjdk.org Tue Mar 14 17:22:18 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Tue, 14 Mar 2023 17:22:18 GMT Subject: Integrated: 8303814: getLastErrorString should avoid charset conversions In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 11:30:27 GMT, Daniel Jeli?ski wrote: > This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. > > Other changes include: > - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. > - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. > - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. > - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. > - `getLastErrorString` is no longer exported by libjava. > > Tier1-3 tests continue to pass. > > No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. > Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. This pull request has now been integrated. Changeset: baf11e73 Author: Daniel Jeli?ski URL: https://git.openjdk.org/jdk/commit/baf11e734f7b5308490edc74f3168744c0857b24 Stats: 151 lines in 8 files changed: 21 ins; 44 del; 86 mod 8303814: getLastErrorString should avoid charset conversions Reviewed-by: naoto, cjplummer, rriggs ------------- PR: https://git.openjdk.org/jdk/pull/12922 From jwaters at openjdk.org Tue Mar 14 17:50:01 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 14 Mar 2023 17:50:01 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v3] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 15:55:27 GMT, Daniel Jeli?ski wrote: >> This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. >> >> Other changes include: >> - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. >> - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. >> - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. >> - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. >> - `getLastErrorString` is no longer exported by libjava. >> >> Tier1-3 tests continue to pass. >> >> No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. >> Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. > > Daniel Jeli?ski has updated the pull request incrementally with one additional commit since the last revision: > > Use NULL where appropriate The change in Windows behaviour seems like a worrying gotcha that someone using the method might be trapped by (C library errors and system errors are reported for Unix while only WINAPI is returned on Windows). Although this has already been pushed I'd still like to mention that me and Thomas did have quite a bit of discussion regarding the conundrum on Windows about whether to report both when the error routines are called or have separate methods/mechanism to select either, see [8292016](https://bugs.openjdk.org/browse/JDK-8292016?filter=-1) ------------- PR: https://git.openjdk.org/jdk/pull/12922 From djelinski at openjdk.org Tue Mar 14 18:20:17 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Tue, 14 Mar 2023 18:20:17 GMT Subject: RFR: 8303814: getLastErrorString should avoid charset conversions [v3] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 15:55:27 GMT, Daniel Jeli?ski wrote: >> This patch modifies the `getLastErrorString` method to return a `jstring`. Thanks to that we can avoid unnecessary back and forth conversions between Unicode and other charsets on Windows. >> >> Other changes include: >> - the Windows implementation of `getLastErrorString` no longer checks `errno`. I verified all uses of the method and confirmed that `errno` is not used anywhere. >> - While at it, I found and fixed a few calls to `JNU_ThrowIOExceptionWithLastError` that were done in context where `LastError` was not set. >> - jdk.hotspot.agent was modified to use `JNU_ThrowByNameWithLastError` and `JNU_ThrowByName` instead of `getLastErrorString`; the code is expected to have identical behavior. >> - zip_util was modified to return static messages instead of generated ones. The generated messages were not observed anywhere, because they were replaced by a static message in ZIP_Open, which is the only method used by other native code. >> - `getLastErrorString` is no longer exported by libjava. >> >> Tier1-3 tests continue to pass. >> >> No new automated regression test; testing this requires installing a language pack that cannot be displayed in the current code page. >> Tested this manually by installing Chinese language pack on English Windows 11, selecting Chinese language, then checking if the message on exception thrown by `InetAddress.getByName("nonexistent.local");` starts with `"?????????"` (or `"\u4e0d\u77e5\u9053\u8fd9\u6837\u7684\u4e3b\u673a\u3002"`). Without the change, the exception message started with a row of question marks. > > Daniel Jeli?ski has updated the pull request incrementally with one additional commit since the last revision: > > Use NULL where appropriate we never use any of the JNU_XXX functions to report errno on Windows as far as I could tell. And even if we did, we'd need to call SetLastError(0) before JNU_Throw to make it work, which we never did. I think we're ok here. ------------- PR: https://git.openjdk.org/jdk/pull/12922 From dcubed at openjdk.org Tue Mar 14 18:22:28 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 18:22:28 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 15:47:29 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > SA fixes related to latest changes in LockStack I kicked off a round of Mach5 Tier1 testing last night. I got 133 SA test failures that are probably fixed by v24. runtime/logging/MonitorInflationTest.java also failed on all 5 configs tested in Tier1: java.lang.RuntimeException: 'inflate(has_locker):' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) at MonitorInflationTest.analyzeOutputOn(MonitorInflationTest.java:41) at MonitorInflationTest.main(MonitorInflationTest.java:56) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) at java.base/java.lang.Thread.run(Thread.java:1623) I suspect that the failing condition is one that I added to the test a long time ago so I'll be taking a look. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 14 18:22:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 18:22:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 18:16:36 GMT, Daniel D. Daugherty wrote: > I kicked off a round of Mach5 Tier1 testing last night. I got 133 SA test failures that are probably fixed by v24. runtime/logging/MonitorInflationTest.java also failed on all 5 configs tested in Tier1: > > ``` > java.lang.RuntimeException: 'inflate(has_locker):' missing from stdout/stderr > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) > at MonitorInflationTest.analyzeOutputOn(MonitorInflationTest.java:41) > at MonitorInflationTest.main(MonitorInflationTest.java:56) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:578) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:1623) > ``` > > I suspect that the failing condition is one that I added to the test a long time ago so I'll be taking a look. Aww, that is bad timing. I pushed a change yesterday that broke SA, and I only pushed a fix 2 hours ago. It should be good now, in case you want to try it again. Thank you for your effort to review and test this change! Roman ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Tue Mar 14 18:44:56 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 18:44:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 15:47:29 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > SA fixes related to latest changes in LockStack I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 14 18:52:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 18:52:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Change log message when inflating fast-locked object ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/2f572097..b834f0ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 14 18:52:42 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 18:52:42 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 18:41:59 GMT, Daniel D. Daugherty wrote: > I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. I just now pushed a simple change that changes the log message 'inflate(fast-locked)' to 'inflate(has_locker)' to make those tests happy. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Tue Mar 14 18:52:45 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 18:52:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 15:47:29 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > SA fixes related to latest changes in LockStack And it looks like you just pushed a fix for: runtime/logging/MonitorInflationTest.java. I killed my Mach5 Tier1 and I'll resync again... ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dnsimon at openjdk.org Tue Mar 14 19:49:48 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 14 Mar 2023 19:49:48 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: On Tue, 14 Mar 2023 13:37:23 GMT, Frederic Parain wrote: >> src/hotspot/share/jvmci/jvmciEnv.hpp line 149: >> >>> 147: }; >>> 148: >>> 149: extern JNIEXPORT jobjectArray c2v_getDeclaredFieldsInfo(JNIEnv* env, jobject, jobject, jlong); >> >> What's the purpose of this declaration? I don't think you need it or the `friend` declaration below since `new_FieldInfo(FieldInfo* fieldinfo, JVMCI_TRAPS)` is public. > > Without this declaration, builds fail on Windows with this error: > `error C2375: 'c2v_getDeclaredFieldsInfo': redefinition; different linkage` Strange - thats not needed for other `JVMCIEnv` methods called from `jvmciCompilerToVM.cpp`. There must be some way to avoid this. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Tue Mar 14 19:49:49 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 19:49:49 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: <0SZEwskweBz1Ri6krqHq9rGWvmwiQ9fkgfRcUEKtTuo=.67dbf4d3-4432-4187-a39d-f16ef76cc2ce@github.com> On Tue, 14 Mar 2023 15:11:36 GMT, Doug Simon wrote: >> Without this declaration, builds fail on Windows with this error: >> `error C2375: 'c2v_getDeclaredFieldsInfo': redefinition; different linkage` > > Strange - thats not needed for other `JVMCIEnv` methods called from `jvmciCompilerToVM.cpp`. There must be some way to avoid this. The issue was caused by the `friend` declaration below (I cannot remember why I added in the first place), which seems to add an implicit declaration of the method that was conflicting with the original declaration of the method. Once the `friend` declaration is removed, builds on Windows don't need the `extern` declaration anymore. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From dcubed at openjdk.org Tue Mar 14 20:06:01 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 20:06:01 GMT Subject: RFR: 8304172: ProblemList serviceability/sa/UniqueVtableTest.java Message-ID: Trivial fixes to ProblemList a couple of tests: [JDK-8304172](https://bugs.openjdk.org/browse/JDK-8304172) ProblemList serviceability/sa/UniqueVtableTest.java [JDK-8304175](https://bugs.openjdk.org/browse/JDK-8304175) ProblemList compiler/vectorapi/VectorLogicalOpIdentityTest.java on 2 platforms ------------- Commit messages: - 8304175: ProblemList compiler/vectorapi/VectorLogicalOpIdentityTest.java on 2 platforms - 8304172: ProblemList serviceability/sa/UniqueVtableTest.java Changes: https://git.openjdk.org/jdk/pull/13029/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13029&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304172 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13029.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13029/head:pull/13029 PR: https://git.openjdk.org/jdk/pull/13029 From azvegint at openjdk.org Tue Mar 14 20:06:03 2023 From: azvegint at openjdk.org (Alexander Zvegintsev) Date: Tue, 14 Mar 2023 20:06:03 GMT Subject: RFR: 8304172: ProblemList serviceability/sa/UniqueVtableTest.java In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 19:52:01 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList a couple of tests: > > [JDK-8304172](https://bugs.openjdk.org/browse/JDK-8304172) ProblemList serviceability/sa/UniqueVtableTest.java > [JDK-8304175](https://bugs.openjdk.org/browse/JDK-8304175) ProblemList compiler/vectorapi/VectorLogicalOpIdentityTest.java on 2 platforms Marked as reviewed by azvegint (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/13029 From dcubed at openjdk.org Tue Mar 14 20:09:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 20:09:13 GMT Subject: RFR: 8304172: ProblemList serviceability/sa/UniqueVtableTest.java In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 20:00:30 GMT, Alexander Zvegintsev wrote: >> Trivial fixes to ProblemList a couple of tests: >> >> [JDK-8304172](https://bugs.openjdk.org/browse/JDK-8304172) ProblemList serviceability/sa/UniqueVtableTest.java >> [JDK-8304175](https://bugs.openjdk.org/browse/JDK-8304175) ProblemList compiler/vectorapi/VectorLogicalOpIdentityTest.java on 2 platforms > > Marked as reviewed by azvegint (Reviewer). @azvegint - Thanks for the fast review! ------------- PR: https://git.openjdk.org/jdk/pull/13029 From dcubed at openjdk.org Tue Mar 14 20:13:18 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 20:13:18 GMT Subject: Integrated: 8304172: ProblemList serviceability/sa/UniqueVtableTest.java In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 19:52:01 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList a couple of tests: > > [JDK-8304172](https://bugs.openjdk.org/browse/JDK-8304172) ProblemList serviceability/sa/UniqueVtableTest.java > [JDK-8304175](https://bugs.openjdk.org/browse/JDK-8304175) ProblemList compiler/vectorapi/VectorLogicalOpIdentityTest.java on 2 platforms This pull request has now been integrated. Changeset: 617c15f5 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/617c15f5a131fdf254fc4277f6dd78d64292db1c Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8304172: ProblemList serviceability/sa/UniqueVtableTest.java 8304175: ProblemList compiler/vectorapi/VectorLogicalOpIdentityTest.java on 2 platforms Reviewed-by: azvegint ------------- PR: https://git.openjdk.org/jdk/pull/13029 From matsaave at openjdk.org Tue Mar 14 20:20:41 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 14 Mar 2023 20:20:41 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: RISCV port update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/a3e7ca02..db892223 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=02-03 Stats: 23 lines in 2 files changed: 5 ins; 12 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From fparain at openjdk.org Tue Mar 14 20:32:53 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 20:32:53 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: <23n-dTVRiGuVl7imPvKph7q43FuB1k7Hak6-mGNDKeM=.40ca325c-e53f-4950-bece-99b7e4f4d367@github.com> References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> <23n-dTVRiGuVl7imPvKph7q43FuB1k7Hak6-mGNDKeM=.40ca325c-e53f-4950-bece-99b7e4f4d367@github.com> Message-ID: On Tue, 14 Mar 2023 15:10:04 GMT, Doug Simon wrote: >> Access to internal modifiers is needed in `HotSpotResolvedJavaFieldTest.testEquivalenceForInternalFields()`. I moved the declaration of the method to `HotSpotResolvedJavaField`. Does this change work for you? > > Just use reflection to read the internal flags (like this test already does for the `index` field). > > I've attached [review.patch](https://github.com/openjdk/jdk/files/10970245/review.patch) with this change and a few other changes I think should be made for better naming (plus one test cleanup). Thank you for the patch, it will be included in the next commit. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From lmesnik at openjdk.org Tue Mar 14 21:56:09 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 14 Mar 2023 21:56:09 GMT Subject: Integrated: 8303705: Field sleeper.started should be volatile JdbLockTestTarg.java In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 03:23:05 GMT, Leonid Mesnik wrote: > Field has been made volatile. This pull request has now been integrated. Changeset: cd41c69d Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/cd41c69d4484f900a89a71f1c9bab2bc2e383c1e Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8303705: Field sleeper.started should be volatile JdbLockTestTarg.java Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13010 From amenkov at openjdk.org Tue Mar 14 22:12:36 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 14 Mar 2023 22:12:36 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out Message-ID: The change: - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. Tested: 100 runs on all platforms, no failures ------------- Commit messages: - UniqueVtableTest Changes: https://git.openjdk.org/jdk/pull/13030/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13030&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303921 Stats: 75 lines in 6 files changed: 37 ins; 22 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/13030.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13030/head:pull/13030 PR: https://git.openjdk.org/jdk/pull/13030 From cjplummer at openjdk.org Tue Mar 14 22:54:12 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 14 Mar 2023 22:54:12 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:05:44 GMT, Alex Menkov wrote: > The change: > - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; > - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. > > Tested: 100 runs on all platforms, no failures Changes requested by cjplummer (Reviewer). test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 158: > 156: Long.toString(lingeredAppPid)); > 157: SATestUtils.addPrivilegesIfNeeded(processBuilder); > 158: OutputAnalyzer SAOutput = ProcessTools.executeProcess(processBuilder); `SAOutput`: local variables should start with lower case. test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 168: > 166: try { > 167: app = LingeredApp.startApp(); > 168: createAnotherToAttach(app.getPid()); Did you ever figure out why attaching from the main test process sometimes fails? test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 195: > 193: } else { > 194: runTest(Long.parseLong(args[0])); > 195: } Could use some comments here. Also, I think `SATestUtils.skipIfCannotAttach` is only needed for the `else` part. ------------- PR: https://git.openjdk.org/jdk/pull/13030 From dcubed at openjdk.org Tue Mar 14 23:09:41 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 23:09:41 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:05:44 GMT, Alex Menkov wrote: > The change: > - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; > - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. > > Tested: 100 runs on all platforms, no failures test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 34: > 32: * jdk.hotspot.agent/sun.jvm.hotspot.types.basic > 33: * > 34: * @run driver UniqueVtableTest The other tests you touched in this PR use: `@run main/othervm ...` so why did this one have to change to: `@run driver ...` ------------- PR: https://git.openjdk.org/jdk/pull/13030 From amenkov at openjdk.org Tue Mar 14 23:27:45 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 14 Mar 2023 23:27:45 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 23:07:13 GMT, Daniel D. Daugherty wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 34: > >> 32: * jdk.hotspot.agent/sun.jvm.hotspot.types.basic >> 33: * >> 34: * @run driver UniqueVtableTest > > The other tests you touched in this PR use: > `@run main/othervm ...` > so why did this one have to change to: > `@run driver ...` Due changes in the test it doesn't need to be run in "othervm" mode, "driver" is ok now to make the test a bit faster I didn't change mode other for other tests ------------- PR: https://git.openjdk.org/jdk/pull/13030 From ccheung at openjdk.org Tue Mar 14 23:37:46 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 14 Mar 2023 23:37:46 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 20:20:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV port update Looks good. Just a few minor comments. src/hotspot/share/interpreter/bootstrapInfo.cpp line 218: > 216: _indy_index, > 217: pool()->tag_at(_bss_index), > 218: CHECK_false); Please indent lines 216-218 like before. src/hotspot/share/interpreter/bootstrapInfo.cpp line 234: > 232: if (_indy_index > -1) { > 233: os::snprintf_checked(what, sizeof(what), "indy#%d", _indy_index); > 234: } Since the `else` case doesn?t have braces, maybe omit the braces for this case as well? src/hotspot/share/oops/cpCache.cpp line 618: > 616: indy_resolution_failed(), parameter_size()); > 617: if ((bytecode_1() == Bytecodes::_invokehandle)) { > 618: constantPoolHandle cph(Thread::current(), cache->constant_pool()); There is another `cph` defined at line 601. Could that one be used? src/hotspot/share/oops/cpCache.cpp line 652: > 650: int size = ConstantPoolCache::size(length); > 651: > 652: // Initialize resolvedinvokedynamicinfo array with available data Maybe breakup the long word `resolvedinvokedynamicinfo`? src/hotspot/share/oops/cpCache.cpp line 653: > 651: > 652: // Initialize resolvedinvokedynamicinfo array with available data > 653: Array* array; Suggestion: rename `array` to `resolved_indy_entries`. src/hotspot/share/oops/cpCache.cpp line 664: > 662: > 663: return new (loader_data, size, MetaspaceObj::ConstantPoolCacheType, THREAD) > 664: ConstantPoolCache(length, index_map, invokedynamic_map, array); I think it reads better if this line is indented to right after the open parenthesis. src/hotspot/share/prims/methodComparator.cpp line 119: > 117: if ((old_cp->name_ref_at(index_old) != new_cp->name_ref_at(index_new)) || > 118: (old_cp->signature_ref_at(index_old) != new_cp->signature_ref_at(index_new))) > 119: return false; Please adjust the indentations of lines 118 and 119 to be the same as lines 124 and 125. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/interpreter/BytecodeWithCPIndex.java line 61: > 59: } else { > 60: return cpCache.getEntryAt((int) (0xFFFF & cpCacheIndex)).getConstantPoolIndex(); > 61: } Maybe align all `return` statements with line 56? src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ResolvedIndyArray.java line 38: > 36: public class ResolvedIndyArray extends GenericArray { > 37: static { > 38: VM.registerVMInitializedObserver(new Observer() { Indentation for java code should be 4 spaces. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ResolvedIndyEntry.java line 38: > 36: private static long size; > 37: private static long baseOffset; > 38: private static CIntegerField cpIndex; Indentation for java code should be 4 spaces. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From amenkov at openjdk.org Tue Mar 14 23:39:10 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 14 Mar 2023 23:39:10 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:49:07 GMT, Chris Plummer wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 168: > >> 166: try { >> 167: app = LingeredApp.startApp(); >> 168: createAnotherToAttach(app.getPid()); > > Did you ever figure out why attaching from the main test process sometimes fails? No. It looks like JTReg executes main test process differently than the sub-process is executed, but I didn't find difference which can cause attach timeouts. ------------- PR: https://git.openjdk.org/jdk/pull/13030 From amenkov at openjdk.org Tue Mar 14 23:44:24 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 14 Mar 2023 23:44:24 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out In-Reply-To: References: Message-ID: <0_yf8Z6uJjCJyKkRCxHPPWZag73calQQWYC-eJk9CDg=.f4ecafc5-34b6-4d1c-a949-798461818c0a@github.com> On Tue, 14 Mar 2023 22:48:30 GMT, Chris Plummer wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 195: > >> 193: } else { >> 194: runTest(Long.parseLong(args[0])); >> 195: } > > Could use some comments here. Also, I think `SATestUtils.skipIfCannotAttach` is only needed for the `else` part. "else" part is a sub-process. As far as I understand it SATestUtils.skipIfCannotAttach can be skipped for "else", but it's needed for main process. ------------- PR: https://git.openjdk.org/jdk/pull/13030 From amenkov at openjdk.org Wed Mar 15 00:34:00 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 15 Mar 2023 00:34:00 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: References: Message-ID: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> > The change: > - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; > - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. > > Tested: 100 runs on all platforms, no failures Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13030/files - new: https://git.openjdk.org/jdk/pull/13030/files/69cc6dae..0e8573d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13030&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13030&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13030.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13030/head:pull/13030 PR: https://git.openjdk.org/jdk/pull/13030 From amenkov at openjdk.org Wed Mar 15 00:37:16 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 15 Mar 2023 00:37:16 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:50:32 GMT, Chris Plummer wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback > > test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 158: > >> 156: Long.toString(lingeredAppPid)); >> 157: SATestUtils.addPrivilegesIfNeeded(processBuilder); >> 158: OutputAnalyzer SAOutput = ProcessTools.executeProcess(processBuilder); > > `SAOutput`: local variables should start with lower case. Fixed ------------- PR: https://git.openjdk.org/jdk/pull/13030 From amenkov at openjdk.org Wed Mar 15 00:37:18 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 15 Mar 2023 00:37:18 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: <0_yf8Z6uJjCJyKkRCxHPPWZag73calQQWYC-eJk9CDg=.f4ecafc5-34b6-4d1c-a949-798461818c0a@github.com> References: <0_yf8Z6uJjCJyKkRCxHPPWZag73calQQWYC-eJk9CDg=.f4ecafc5-34b6-4d1c-a949-798461818c0a@github.com> Message-ID: On Tue, 14 Mar 2023 23:41:47 GMT, Alex Menkov wrote: >> test/hotspot/jtreg/serviceability/sa/UniqueVtableTest.java line 195: >> >>> 193: } else { >>> 194: runTest(Long.parseLong(args[0])); >>> 195: } >> >> Could use some comments here. Also, I think `SATestUtils.skipIfCannotAttach` is only needed for the `else` part. > > "else" part is a sub-process. > As far as I understand it SATestUtils.skipIfCannotAttach can be skipped for "else", but it's needed for main process. Added comment. Left SATestUtils.skipIfCannotAttach as is (this is consistent with other SA tests) ------------- PR: https://git.openjdk.org/jdk/pull/13030 From dcubed at openjdk.org Wed Mar 15 01:28:32 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Mar 2023 01:28:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 18:52:39 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Change log message when inflating fast-locked object I did Mach5 Tier{1,2,3} on v25. Please see the bug report for the gory details: Tier1 - 1 known, unrelated failure Tier2 - 4 closed, unknown, related test failures Tier3 - 8 closed, unknown, related test failures; 2 open, known, unrelated test failures; 16 open, unknown, related test failures I'm pausing my Mach5 testing at this point. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Wed Mar 15 01:29:56 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 15 Mar 2023 01:29:56 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: References: <0_yf8Z6uJjCJyKkRCxHPPWZag73calQQWYC-eJk9CDg=.f4ecafc5-34b6-4d1c-a949-798461818c0a@github.com> Message-ID: On Wed, 15 Mar 2023 00:34:32 GMT, Alex Menkov wrote: >> "else" part is a sub-process. >> As far as I understand it SATestUtils.skipIfCannotAttach can be skipped for "else", but it's needed for main process. > > Added comment. > Left SATestUtils.skipIfCannotAttach as is (this is consistent with other SA tests) > "else" part is a sub-process. Ok, I read it backwards. Part of the reason I asked for comments. :) ------------- PR: https://git.openjdk.org/jdk/pull/13030 From dholmes at openjdk.org Wed Mar 15 01:54:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 01:54:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 18:46:47 GMT, Roman Kennke wrote: >> I've reviewed the changes in v23 and v24. Trying another >> Mach5 Tier1 job set. > >> I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. > > I just now pushed a simple change that changes the log message 'inflate(fast-locked)' to 'inflate(has_locker)' to make those tests happy. @rkennke this still seems to be very much a work-in-progress rather than actual PR review at this stage. Perhaps it should move back to draft until you actually have something you think is ready for integration? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From gcao at openjdk.org Wed Mar 15 02:07:25 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 15 Mar 2023 02:07:25 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 20:20:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV port update Changes requested by gcao (Author). ------------- PR: https://git.openjdk.org/jdk/pull/12778 From gcao at openjdk.org Wed Mar 15 02:07:29 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 15 Mar 2023 02:07:29 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 13:59:48 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Typo in comment > - Merge branch 'master' into resolvedIndyEntry_8301995 > - Interpreter optimization and comments > - PPC and RISCV port > - 8301995: Move invokedynamic resolution information out of the cpCache src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1843: > 1841: ldr(cache, Address(rcpool, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); > 1842: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) > 1843: mov(tmp, sizeof(ResolvedIndyEntry)); The tmp register is not used here, is it redundant? ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dholmes at openjdk.org Wed Mar 15 02:44:21 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 02:44:21 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> References: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> Message-ID: <5v8dU5lgn0OFL07BSu5DH4WLC-6wvttVGwtnuQS1NCE=.4571773b-6fdb-4a69-b117-21f1896a3060@github.com> On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > feedback Not sure removing the build directives was the right way to go. As per the jtreg tag guide: > A test that relies upon library classes should contain appropriate @build directives to ensure that the classes will be compiled. It is strongly recommended that tests do not rely on the use of implicit compilation by the Java compiler. so the problem is likely caused by missing build directives in the test(s) that fails. ------------- PR: https://git.openjdk.org/jdk/pull/13030 From duke at openjdk.org Wed Mar 15 04:09:04 2023 From: duke at openjdk.org (liach) Date: Wed, 15 Mar 2023 04:09:04 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2] In-Reply-To: References: Message-ID: > Summaries: > 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? liach has updated the pull request incrementally with one additional commit since the last revision: Shorten lines, move from mask() to ACC_ constants, other misc improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13009/files - new: https://git.openjdk.org/jdk/pull/13009/files/837ea4bb..c6536bf9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=00-01 Stats: 196 lines in 19 files changed: 59 ins; 26 del; 111 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009 From fyang at openjdk.org Wed Mar 15 07:43:32 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 15 Mar 2023 07:43:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 18:46:47 GMT, Roman Kennke wrote: >> I've reviewed the changes in v23 and v24. Trying another >> Mach5 Tier1 job set. > >> I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. > > I just now pushed a simple change that changes the log message 'inflate(fast-locked)' to 'inflate(has_locker)' to make those tests happy. @rkennke : Hi, I have prepared some extra changes for RISC-V to make it work. See attachment. BTW: You might also want to use -w instructions in MacroAssembler::fast_unlock for aarch64. [more-riscv-changes.txt](https://github.com/openjdk/jdk/files/10977109/more-riscv-changes.txt) ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Wed Mar 15 09:41:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 15 Mar 2023 09:41:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - More RISCV changes (by Fei Yang) - Use -w instructions in fast_unlock() - Increase stub size of C2HandleAnonOwnerStub to 18 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/b834f0ca..0ad01c1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=25-26 Stats: 136 lines in 7 files changed: 62 ins; 28 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Wed Mar 15 09:41:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 15 Mar 2023 09:41:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 01:25:33 GMT, Daniel D. Daugherty wrote: > I did Mach5 Tier{1,2,3} on v25. Please see the bug report for the gory details: > > Tier1 - 1 known, unrelated failure Tier2 - 4 closed, unknown, related test failures Tier3 - 8 closed, unknown, related test failures; 2 open, known, unrelated test failures; 16 open, unknown, related test failures > > I'm pausing my Mach5 testing at this point. Hi Daniel, I could not reproduce any of the test failures, neither on x86_64 nor on aarch64. I have blindly fixed the code stub size issue, it seems rather trivial. Would it be possible to open/send me the failing test that triggers vframeArray assert or extract a reproducer that you could publish? I looked at it but could not figure out what could be going on there. Thanks, Roman ------------- PR: https://git.openjdk.org/jdk/pull/10907 From fparain at openjdk.org Wed Mar 15 13:45:46 2023 From: fparain at openjdk.org (Frederic Parain) Date: Wed, 15 Mar 2023 13:45:46 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 01:25:01 GMT, Chris Plummer wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes includes and style > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 75: > >> 73: int initialValueIndex; >> 74: int genericSignatureIndex; >> 75: int contendedGroup; > > It seems that these should all be shorts. All the getter methods are casting them to short. Indexes in the constant pool are unsigned shorts, but Java shorts are signed, using ints is the simplest way to store those indexes. > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 108: > >> 106: CLASS_STATE_INITIALIZATION_ERROR = db.lookupIntConstant("InstanceKlass::initialization_error").intValue(); >> 107: // We need a new fieldsCache each time we attach. >> 108: fieldsCache = new HashMap(); > > This should probably be a WeakHashMap. I tried it and it seems to work (or at least didn't cause any problems). However, when doing a heap dump I didn't notice the table being any smaller on exit when it was made weak, even though there were numerous GC's while dumping the heap. > > The `` is the Address of the hotspot InstanceKlass instance, and this Address is referenced by the SA InstanceKlass mirror. So theoretically when the reference to the mirror goes way, then the cache entry can be cleared. I've changed the map to a WeakHashMap and didn't see any issue during testing. But I didn't measure footprint. > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 325: > >> 323: >> 324: public int getFieldOffset(int index) { >> 325: return (int)getField(index).getOffset(); > > Cast to int is not needed Other APIs (like MetadaField) are using longs to pass offsets, doing a cast here is less disruptive than changing all the other APIs. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From matsaave at openjdk.org Wed Mar 15 15:38:10 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Mar 2023 15:38:10 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 23:29:17 GMT, Calvin Cheung wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> RISCV port update > > src/hotspot/share/interpreter/bootstrapInfo.cpp line 234: > >> 232: if (_indy_index > -1) { >> 233: os::snprintf_checked(what, sizeof(what), "indy#%d", _indy_index); >> 234: } > > Since the `else` case doesn?t have braces, maybe omit the braces for this case as well? The if statements below use braces so I think it would be better to add braces to the else case. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From fparain at openjdk.org Wed Mar 15 15:41:17 2023 From: fparain at openjdk.org (Frederic Parain) Date: Wed, 15 Mar 2023 15:41:17 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: SA and JVMCI fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/12b4f1b4..f81337f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=03-04 Stats: 130 lines in 13 files changed: 14 ins; 46 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From jlu at openjdk.org Wed Mar 15 16:08:03 2023 From: jlu at openjdk.org (Justin Lu) Date: Wed, 15 Mar 2023 16:08:03 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native Message-ID: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. ------------- Commit messages: - Write to ASCII - Read in .properties as UTF-8, but write to LRB .java as ISO-8859-1 - Compile class with ascii (Not ready to make system wide change) - Toggle UTF-8 for javac option in JavaCompilation.gmk - CompileProperties converts in UTF-8 - Convert .properties from ISO-8859-1 to UTF-8 Changes: https://git.openjdk.org/jdk/pull/12726/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301991 Stats: 29093 lines in 490 files changed: 6 ins; 0 del; 29087 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726 From jjg at openjdk.org Wed Mar 15 16:08:06 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Wed, 15 Mar 2023 16:08:06 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu wrote: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. make/langtools/tools/compileproperties/CompileProperties.java line 252: > 250: try { > 251: writer = new BufferedWriter( > 252: new OutputStreamWriter(new FileOutputStream(outputPath), StandardCharsets.ISO_8859_1)); Using ISO_8859_1 seems strange. Since these are generated files, you could write them as UTF-8 and then override the default javac option for ascii when compiling _just_ these files. Or else just stay with ascii; no one should be looking at these files! ------------- PR: https://git.openjdk.org/jdk/pull/12726 From matsaave at openjdk.org Wed Mar 15 16:08:21 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Mar 2023 16:08:21 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 15:39:39 GMT, Gui Cao wrote: >> Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Typo in comment >> - Merge branch 'master' into resolvedIndyEntry_8301995 >> - Interpreter optimization and comments >> - PPC and RISCV port >> - 8301995: Move invokedynamic resolution information out of the cpCache > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1843: > >> 1841: ldr(cache, Address(rcpool, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); >> 1842: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) >> 1843: mov(tmp, sizeof(ResolvedIndyEntry)); > > The tmp register is not used here, is it redundant? Right, the tmp register is not needed anymore thanks to the mul to shift optimization. Note that shifting will not be possible on 32-bit systems due to the size of ResolvedIndyEntry not being a power of two. This optimization only works on 64 bit builds. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From jlu at openjdk.org Wed Mar 15 16:08:07 2023 From: jlu at openjdk.org (Justin Lu) Date: Wed, 15 Mar 2023 16:08:07 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: <_dP9N3UNWa82tfLVEapoSFJjbvMmlyP21ZbuL0NjTDU=.3685af0b-31a0-42aa-86b0-5098bda72766@github.com> On Tue, 7 Mar 2023 23:15:14 GMT, Jonathan Gibbons wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > make/langtools/tools/compileproperties/CompileProperties.java line 252: > >> 250: try { >> 251: writer = new BufferedWriter( >> 252: new OutputStreamWriter(new FileOutputStream(outputPath), StandardCharsets.ISO_8859_1)); > > Using ISO_8859_1 seems strange. > Since these are generated files, you could write them as UTF-8 and then override the default javac option for ascii when compiling _just_ these files. > > Or else just stay with ascii; no one should be looking at these files! Will stick with your latter solution, as since the .properties files were converted via native2ascii, it makes sense to write out via ascii. ------------- PR: https://git.openjdk.org/jdk/pull/12726 From duke at openjdk.org Wed Mar 15 16:21:33 2023 From: duke at openjdk.org (Archie L. Cobbs) Date: Wed, 15 Mar 2023 16:21:33 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: <1I9v8d2OiyLfQVCozGYVRhAi3AotqGuRUhsNj0VCsUk=.e673ca33-d24f-4aab-908e-a5c0bfa3bf7c@github.com> On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu wrote: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1: > 1: # This file should probably be excluded because it's used in a test that relates to UTF-8 encoding (or not) of property files. ------------- PR: https://git.openjdk.org/jdk/pull/12726 From matsaave at openjdk.org Wed Mar 15 16:35:22 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Mar 2023 16:35:22 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v5] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed indentation and other comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/db892223..415e7116 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=03-04 Stats: 71 lines in 9 files changed: 1 ins; 3 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From cjplummer at openjdk.org Wed Mar 15 16:41:16 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 15 Mar 2023 16:41:16 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5] In-Reply-To: References: Message-ID: <_thEXXKYB00W5Mmg8hGRKMLqo6vog84sjtp2Mqf2wqk=.8f5a76a6-30c5-4a05-ae7a-753e1d70ddee@github.com> On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > SA and JVMCI fixes SA changes looks good. Thanks for taking care of this! ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From rrich at openjdk.org Wed Mar 15 16:53:40 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 Mar 2023 16:53:40 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Tue, 14 Mar 2023 17:01:20 GMT, Matias Saavedra Silva wrote: > > @matias9927 can I ask you to merge master? There seem to be conflicts (at least I see a message "This branch has conflicts that must be resolved"). I'd like to give the change a spin in our CI testing. This requires that it can be applied on master. > > I saw that merge error but nothing came up when I tried to merge locally. The branch is updated nonetheless, so you should be able to test it now @reinrich ! Thanks. The testing didn't reveal anything. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Wed Mar 15 18:45:00 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Mar 2023 18:45:00 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed aarch64 interpreter mistake ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/415e7116..9a3a63ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From sspitsyn at openjdk.org Wed Mar 15 18:53:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 Mar 2023 18:53:24 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> References: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> Message-ID: On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > feedback This looks pretty good. How did you test the fix? Does it never fail with your fix? Thanks, Serguei ------------- PR: https://git.openjdk.org/jdk/pull/13030 From dcubed at openjdk.org Wed Mar 15 18:56:35 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Mar 2023 18:56:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: <3lU8tL9eqZfwn3qgslK3WcjAInuVFsuY2X1vpukzbJI=.53255494-3359-46fb-a311-30cd59091b7b@github.com> On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 Reviewed the v26 changes except for the riscv files. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From mdoerr at openjdk.org Wed Mar 15 19:07:34 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Mar 2023 19:07:34 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > 3396: const Bytecodes::Code code = bytecode(); > 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; > 3398: const bool is_invokedynamic = code == false; // should not reach here with invokedynamic This looks strange! I guess you wanted to delete more? ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dcubed at openjdk.org Wed Mar 15 19:17:38 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Mar 2023 19:17:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:36:25 GMT, Roman Kennke wrote: > Would it be possible to open/send me the failing test that triggers vframeArray assert > or extract a reproducer that you could publish? I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Wed Mar 15 19:35:25 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 Mar 2023 19:35:25 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 12:26:16 GMT, Markus Gr?nlund wrote: >> I've had a good look through now and have a better sense of the refactoring. Seems good. >> >> I'll wait for any tweaks before hitting the approve button though. >> >> Thanks > >> I've had a good look through now and have a better sense of the refactoring. Seems good. >> >> >> >> I'll wait for any tweaks before hitting the approve button though. >> >> >> >> Thanks > > Thanks so much for taking a look. I realized that implementation details of loading should probably reside in agent.cpp, not agentList.cpp. > > I am currently off on vacation and will update when back. Thanks also to Andrew Dinn for comments. @mgronlun I'm looking at the fixes but it will take some time. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From rkennke at openjdk.org Wed Mar 15 19:43:34 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 15 Mar 2023 19:43:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 19:14:09 GMT, Daniel D. Daugherty wrote: > > Would it be possible to open/send me the failing test that triggers vframeArray assert > > or extract a reproducer that you could publish? > > I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. Thank you! Regarding moving this PR back to draft, I am not sure. I can do that, yes. But really the fundamental algorithm and implementation is basically fixed since half a year already. I have re-worked it into a fresh PR based on the request to put it behind a flag. The recent change to a fixed-size lock-stack has probably invalidated part of your previous reviews, and I am sorry for that. On the upside, it removed a lot of complexity in the JIT compilers and assembly code generators. What else do I expect to happen? Thomas is working on an ARM(32) port, but this is quite separate and could even land after this PR is done. I still don't quite like the naming. Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. It is an alternative (and less racy, on the object header) way to implement a thin-locking layer before inflating monitors, that is all. So maybe -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is that when we eventually switch to Lilliput turned on by default, we would entirely remove stack-locking. I would also add some code in arguments.cpp to keep this new thin locking turned off on platforms that don't yet support it. Besides that, from my POV, it is pretty much done. What do you think? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From amenkov at openjdk.org Wed Mar 15 20:05:27 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 15 Mar 2023 20:05:27 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: <5v8dU5lgn0OFL07BSu5DH4WLC-6wvttVGwtnuQS1NCE=.4571773b-6fdb-4a69-b117-21f1896a3060@github.com> References: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> <5v8dU5lgn0OFL07BSu5DH4WLC-6wvttVGwtnuQS1NCE=.4571773b-6fdb-4a69-b117-21f1896a3060@github.com> Message-ID: On Wed, 15 Mar 2023 02:41:18 GMT, David Holmes wrote: > Not sure removing the build directives was the right way to go. As per the jtreg tag guide: > > > A test that relies upon library classes should contain appropriate @build directives to ensure that the classes will be compiled. It is strongly recommended that tests do not rely on the use of implicit compilation by the Java compiler. > > so the problem is likely caused by missing build directives in the test(s) that fails. This is long standing issue with NoClassDefFoundError for library classes. As far as I got from reading similar issues there was a try to add build directive for failing tests, but other tests from the same directory started to fail with NoClassDefFoundError later, and now most of the test have no build action for libraries. It seems to me that it's much simpler to remove build action from 4 tests in the directory than add it for other 55 ------------- PR: https://git.openjdk.org/jdk/pull/13030 From amenkov at openjdk.org Wed Mar 15 20:10:19 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 15 Mar 2023 20:10:19 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: References: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> Message-ID: On Wed, 15 Mar 2023 18:50:25 GMT, Serguei Spitsyn wrote: > This looks pretty good. How did you test the fix? Does it never fail with your fix? Thanks, Serguei I run test jobs for "serviceability/sa" 100 times on all platforms, no failures (neither attach timeout nor NoClassDefFoundError) ------------- PR: https://git.openjdk.org/jdk/pull/13030 From naoto at openjdk.org Wed Mar 15 20:23:23 2023 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 15 Mar 2023 20:23:23 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu wrote: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156: > 154: zh=\u00A4 > 155: zh_CN=\uFFE5 > 156: zh_HK=HK$ Why are they not encoded into UTF-8 native? ------------- PR: https://git.openjdk.org/jdk/pull/12726 From dcubed at openjdk.org Wed Mar 15 20:58:38 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Mar 2023 20:58:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: <14eqGd_d9yW5aXDqAYrCnohm4cB7tACgAKc_qDsTJGA=.33c510e9-63f0-4a5f-aebe-bc77f0ffbefc@github.com> On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 Personally, I'm fine with leaving this PR in the non-draft/ready-to-review state. However, that's because I'm very much in sync (no pun intended) with where this code is at currently. I catch up on the latest changes everyday and I've started doing Mach5 test cycles everyday. When I return to Orlando, I will start doing stress testing runs in my lab. For other folks that started reviewing earlier than I did, they may have a different POV. I do see the change to a fixed-size lock-stack as a wonderful improvement because it got rid of so many changes. I think the project went from 74 changed and new files down to 51 changed and new files. As for naming, I don't have any suggestions at the moment. I believe @dholmes-ora has commented in other reviews that "Naming is hard..." ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Wed Mar 15 21:32:36 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Mar 2023 21:32:36 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 I proposed NewStyleThinLocks or ThinLocks2. Anything really thats easy to grep for and to distinguish from old stack based locks. I can live with RomanStyleLocks :) If things work out this is a transient state anyway. I also think Roman should just decide and name it. Getting rid pf the growing part of LockStack is a relief. The missing ports are no showstoppers, we can do them later (I'll work on arm but I'm swamped atm so this may take a week or so). ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Thu Mar 16 05:24:59 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Mar 2023 05:24:59 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics Message-ID: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> This is needed for performance improvements in support of virtual threads. The update includes the following: 1. Refactored the `VirtualThread` native methods: `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. Testing: - Ran mach5 tiers 1-6. No regressions were found. ------------- Commit messages: - fix traling spaces in a couple of files - minor update for VTMS_notify_jvmti_events variable - 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics Changes: https://git.openjdk.org/jdk/pull/13054/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304303 Stats: 438 lines in 20 files changed: 265 ins; 125 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054 From dholmes at openjdk.org Thu Mar 16 05:48:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 05:48:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 19:40:33 GMT, Roman Kennke wrote: >>> Would it be possible to open/send me the failing test that triggers vframeArray assert >>> or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take >> to move that test from closed to open. Will keep you posted. > >> > Would it be possible to open/send me the failing test that triggers vframeArray assert >> > or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. > > Thank you! > > Regarding moving this PR back to draft, I am not sure. I can do that, yes. But really the fundamental algorithm and implementation is basically fixed since half a year already. I have re-worked it into a fresh PR based on the request to put it behind a flag. The recent change to a fixed-size lock-stack has probably invalidated part of your previous reviews, and I am sorry for that. On the upside, it removed a lot of complexity in the JIT compilers and assembly code generators. > > What else do I expect to happen? > > Thomas is working on an ARM(32) port, but this is quite separate and could even land after this PR is done. > > I still don't quite like the naming. Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. It is an alternative (and less racy, on the object header) way to implement a thin-locking layer before inflating monitors, that is all. So maybe -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is that when we eventually switch to Lilliput turned on by default, we would entirely remove stack-locking. > > I would also add some code in arguments.cpp to keep this new thin locking turned off on platforms that don't yet support it. > > Besides that, from my POV, it is pretty much done. > > What do you think? @rkennke The changed to fixed-size lock stack was a significant change as you note and that suggested to me that the design was still in flux. So I have to wonder whether everything is in fact now stable? (or as stable as one should expect for an experimental new feature) > Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 06:08:34 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 06:08:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: <4ID9G5P6KGhzXLzlOEc2_lcAlMUo2GppzSA_gazLT2Q=.f3299b50-20f6-4666-9a60-4145edf3c5ee@github.com> On Thu, 16 Mar 2023 05:45:29 GMT, David Holmes wrote: > Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks They don't use the stack anymore; would this not be us using a wrong name just for history's sake? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 06:34:35 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 06:34:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 19:40:33 GMT, Roman Kennke wrote: >>> Would it be possible to open/send me the failing test that triggers vframeArray assert >>> or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take >> to move that test from closed to open. Will keep you posted. > >> > Would it be possible to open/send me the failing test that triggers vframeArray assert >> > or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. > > Thank you! > > Regarding moving this PR back to draft, I am not sure. I can do that, yes. But really the fundamental algorithm and implementation is basically fixed since half a year already. I have re-worked it into a fresh PR based on the request to put it behind a flag. The recent change to a fixed-size lock-stack has probably invalidated part of your previous reviews, and I am sorry for that. On the upside, it removed a lot of complexity in the JIT compilers and assembly code generators. > > What else do I expect to happen? > > Thomas is working on an ARM(32) port, but this is quite separate and could even land after this PR is done. > > I still don't quite like the naming. Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. It is an alternative (and less racy, on the object header) way to implement a thin-locking layer before inflating monitors, that is all. So maybe -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is that when we eventually switch to Lilliput turned on by default, we would entirely remove stack-locking. > > I would also add some code in arguments.cpp to keep this new thin locking turned off on platforms that don't yet support it. > > Besides that, from my POV, it is pretty much done. > > What do you think? > @rkennke The changed to fixed-size lock stack was a significant change as you note and that suggested to me that the design was still in flux. So I have to wonder whether everything is in fact now stable? (or as stable as one should expect for an experimental new feature) I think it is, except for the few points that I mentioned earlier, and anything that comes up in reviews, I don't expect any major design changes. In fact, I would actively hold them back if anything comes up, to move this PR across the line at this point. I can't think of any bad spots where I thunk 'oh this is ugly - this needs a better approach' though. > > Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. > > > > Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks That's fine by me. Thank you, Roman ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 06:34:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 06:34:36 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: <4ID9G5P6KGhzXLzlOEc2_lcAlMUo2GppzSA_gazLT2Q=.f3299b50-20f6-4666-9a60-4145edf3c5ee@github.com> References: <4ID9G5P6KGhzXLzlOEc2_lcAlMUo2GppzSA_gazLT2Q=.f3299b50-20f6-4666-9a60-4145edf3c5ee@github.com> Message-ID: On Thu, 16 Mar 2023 06:05:38 GMT, Thomas Stuefe wrote: > > > > Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks > > > > They don't use the stack anymore; would this not be us using a wrong name just for history's sake? Well, it's still got the lock-stacks. :-D ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 06:42:32 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 06:42:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: <4ID9G5P6KGhzXLzlOEc2_lcAlMUo2GppzSA_gazLT2Q=.f3299b50-20f6-4666-9a60-4145edf3c5ee@github.com> Message-ID: On Thu, 16 Mar 2023 06:31:42 GMT, Roman Kennke wrote: > > > Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks > > > > > > They don't use the stack anymore; would this not be us using a wrong name just for history's sake? > > Well, it's still got the lock-stacks. :-D Yes, but we have variables like "is_stack_locked" etc, without an "old" qualifier. Idk, up to you. Better than UseFastLocking I guess. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dholmes at openjdk.org Thu Mar 16 07:02:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 07:02:23 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: References: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> <5v8dU5lgn0OFL07BSu5DH4WLC-6wvttVGwtnuQS1NCE=.4571773b-6fdb-4a69-b117-21f1896a3060@github.com> Message-ID: On Wed, 15 Mar 2023 20:02:11 GMT, Alex Menkov wrote: > It seems to me that it's much simpler to remove build action from 4 tests in the directory than add it for other 55 True. Sadly we keep getting bitten over and over by CODETOOLS-7902847. Sometimes the "fix" is to remove build directives (in Hotspot we switched from adding to removing back in 2017) and sometimes it is to add them (there have been some recent fixes that have done this - including by me). ------------- PR: https://git.openjdk.org/jdk/pull/13030 From sspitsyn at openjdk.org Thu Mar 16 07:11:00 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Mar 2023 07:11:00 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v2] In-Reply-To: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> Message-ID: > This is needed for performance improvements in support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: include jniHandles.hpp into sharedRuntime.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13054/files - new: https://git.openjdk.org/jdk/pull/13054/files/8a379320..397b6337 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054 From dholmes at openjdk.org Thu Mar 16 07:34:36 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 07:34:36 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 I agree "stack lock" is not ideal even though we do still have lock-stacks. :) But "thin locks" is definitely out, and "fast lock" seems generally agreed to be meaningless. How about just UseNewLocks ? Or UseNewMarkWordLocks? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 08:03:38 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 08:03:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 I like UseNewLocks but fear that this may conflict with Oracles plan (?) to move OMs into heap, which would be another revamp of locking - fat locks in this case - and may come with yet another switch. Other than that, UseNewLocks sounds good and succinct. Another proposal: UseThreadLockStack or UseLockStack ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rehn at openjdk.org Thu Mar 16 08:15:38 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 16 Mar 2023 08:15:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 08:00:38 GMT, Thomas Stuefe wrote: > I like UseNewLocks but fear that this may conflict with Oracles plan (?) to move OMs into heap, which would be another revamp of locking - fat locks in this case - and may come with yet another switch. Other than that, UseNewLocks sounds good and succinct. > > Another proposal: UseThreadLockStack or UseLockStack Just a FYI, at the moment we have: product(ccstr, ObjectSynchronizerMode, "fast", \ "ObjectSynchronizer modes: " \ "legacy: legacy native system; " \ "native: java entry with native monitors; " \ "heavy: java entry with always inflated Java monitors; " \ "fast: java entry with fast-locks and" \ " inflate-on-demand Java monitors; ") \ At least personally I prefer one option than using many. A cmd line with e.g. `-XX:-UseLockStack -XX:+UseHeavyMonitors` It's harder, for me ?, to figure out what is selected and what was intended to be selected. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 08:39:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 08:39:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is meant to be transitional, it is not meant to be used by end-users (except the bravest nerds) at all. When it lands, the Lilliput flag (e.g. +UseCompactObjectHeaders) will also control the locking flag. Eventually (e.g. release+1) both flags would become on by default and afterwards (e.g. release+2) would go away entirely, at which point the whole original stack-locking would disappear. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 08:52:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 08:52:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 08:36:45 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: >> >> - More RISCV changes (by Fei Yang) >> - Use -w instructions in fast_unlock() >> - Increase stub size of C2HandleAnonOwnerStub to 18 > > I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is meant to be transitional, it is not meant to be used by end-users (except the bravest nerds) at all. When it lands, the Lilliput flag (e.g. +UseCompactObjectHeaders) will also control the locking flag. Eventually (e.g. release+1) both flags would become on by default and afterwards (e.g. release+2) would go away entirely, at which point the whole original stack-locking would disappear. @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 09:05:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 09:05:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> On Thu, 16 Mar 2023 08:36:45 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: >> >> - More RISCV changes (by Fei Yang) >> - Use -w instructions in fast_unlock() >> - Increase stub size of C2HandleAnonOwnerStub to 18 > > I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is meant to be transitional, it is not meant to be used by end-users (except the bravest nerds) at all. When it lands, the Lilliput flag (e.g. +UseCompactObjectHeaders) will also control the locking flag. Eventually (e.g. release+1) both flags would become on by default and afterwards (e.g. release+2) would go away entirely, at which point the whole original stack-locking would disappear. > @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 09:09:48 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 09:09:48 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> References: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> Message-ID: On Thu, 16 Mar 2023 09:02:19 GMT, Roman Kennke wrote: > > > @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. Thanks for clarifying. Yes, asserting that would make sense. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rrich at openjdk.org Thu Mar 16 09:10:39 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 09:10:39 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 19:04:41 GMT, Martin Doerr wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed aarch64 interpreter mistake > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > >> 3396: const Bytecodes::Code code = bytecode(); >> 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; >> 3398: const bool is_invokedynamic = code == false; // should not reach here with invokedynamic > > This looks strange! I guess you wanted to delete more? Basically I kept the local variable as a name for the (now) constant value passed in the call at L3409. The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is declared in a shared header. I could replace the variable reference in the call with `false /* is_invokedynamic */` if you like that better. Personally I'd prefer the current version. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From mdoerr at openjdk.org Thu Mar 16 09:29:26 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 09:29:26 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 09:07:27 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: >> >>> 3396: const Bytecodes::Code code = bytecode(); >>> 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; >>> 3398: const bool is_invokedynamic = code == false; // should not reach here with invokedynamic >> >> This looks strange! I guess you wanted to delete more? > > Basically I kept the local variable as a name for the (now) constant value passed in the call at L3409. > > The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is declared in a shared header. > > I could replace the variable reference in the call with `false /* is_invokedynamic */` if you like that better. Personally I'd prefer the current version. I meant `code == false`. That was probably not intended. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Thu Mar 16 09:29:27 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 09:29:27 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 09:21:26 GMT, Martin Doerr wrote: >> Basically I kept the local variable as a name for the (now) constant value passed in the call at L3409. >> >> The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is declared in a shared header. >> >> I could replace the variable reference in the call with `false /* is_invokedynamic */` if you like that better. Personally I'd prefer the current version. > > I meant `code == false`. That was probably not intended. Oh my ... Your are right of course. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Thu Mar 16 09:29:30 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 09:29:30 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: <4xv3uef5CQ0pArU0jbJjWms_qd3akl8ZdbRs18CW31w=.51b759d7-d479-4174-92da-d4bda0500597@github.com> On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > 3396: const Bytecodes::Code code = bytecode(); > 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; > 3398: const bool is_invokedynamic = code == false; // should not reach here with invokedynamic This is what I meant. Suggestion: const bool is_invokedynamic = false; // should not reach here with invokedynamic Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rehn at openjdk.org Thu Mar 16 10:23:41 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 16 Mar 2023 10:23:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> References: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> Message-ID: On Thu, 16 Mar 2023 09:02:19 GMT, Roman Kennke wrote: >> I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is meant to be transitional, it is not meant to be used by end-users (except the bravest nerds) at all. When it lands, the Lilliput flag (e.g. +UseCompactObjectHeaders) will also control the locking flag. Eventually (e.g. release+1) both flags would become on by default and afterwards (e.g. release+2) would go away entirely, at which point the whole original stack-locking would disappear. > >> @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. > > > @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. > > > > > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. > > Thanks for clarifying. Yes, asserting that would make sense. FYI: I'm trying to convince folks that JVMS should be allowed to enforce asymmetric locking. We think most people don't know they will be stuck in interpreter, unintended. What was discussed latest was to diagnose and warn about this behavior as a first step. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 10:29:31 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 10:29:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> Message-ID: On Thu, 16 Mar 2023 10:20:21 GMT, Robbin Ehn wrote: > > > > > @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. > > > > > > > > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. > > > > > > Thanks for clarifying. Yes, asserting that would make sense. > > FYI: I'm trying to convince folks that JVMS should be allowed to enforce asymmetric locking. We think most people don't know they will be stuck in interpreter, unintended. What was discussed latest was to diagnose and warn about this behavior as a first step. Sounds good. Just to be clear, you mean enforce symmetric locking? resp. forbid asymmetric locking? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rehn at openjdk.org Thu Mar 16 11:00:35 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 16 Mar 2023 11:00:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> Message-ID: On Thu, 16 Mar 2023 10:26:26 GMT, Thomas Stuefe wrote: > Sounds good. Just to be clear, you mean enforce symmetric locking? resp. forbid asymmetric locking? Yes, sorry, thanks for correcting! :) ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 12:24:35 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 12:24:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> References: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> Message-ID: On Sat, 11 Mar 2023 14:57:19 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use nullptr instead of NULL in touched code (shared) > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234: > >> 6232: >> 6233: // Load (object->mark() | 1) into hdr >> 6234: orr(hdr, hdr, markWord::unlocked_value); > > I wondered why this is needed. Should we not have the header of an unloaded object in hdr? Or is this a safeguard against a misuse of this function (called with the header of an already locked object)? It could be a monitor-locked header. In C2 this is not possible and we *could* save an instruction here, I guess. Not sure if it is worth it, though. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 12:32:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 12:32:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> References: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> Message-ID: On Sat, 11 Mar 2023 14:52:54 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use nullptr instead of NULL in touched code (shared) > > src/hotspot/share/runtime/lockStack.hpp line 64: > >> 62: >> 63: // GC support >> 64: inline void oops_do(OopClosure* cl); > > Does this need to be nonconst? Yes, because the OopClosures can (and will) update the inline array elements. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 12:51:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 12:51:10 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Several changes (mostly cosmetic) in response to reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/0ad01c1d..2445a19d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=26-27 Stats: 21 lines in 11 files changed: 7 ins; 5 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From asotona at openjdk.org Thu Mar 16 13:59:21 2023 From: asotona at openjdk.org (Adam Sotona) Date: Thu, 16 Mar 2023 13:59:21 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 04:09:04 GMT, Chen Liang wrote: >> Summaries: >> 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before >> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. >> 3. Most tests are included in tier1, but some are not: >> In `:jdk_io`: (tier2, part 2) >> >> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java >> test/jdk/java/io/Serializable/records/ProhibitedMethods.java >> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java >> >> In `:jdk_instrument`: (tier 3) >> >> test/jdk/java/lang/instrument/RetransformAgent.java >> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java >> test/jdk/java/lang/instrument/asmlib/Instrumentor.java >> >> >> @asotona Would you mind reviewing? > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Shorten lines, move from mask() to ACC_ constants, other misc improvements test/jdk/java/lang/Class/getSimpleName/GetSimpleNameTest.java line 174: > 172: clb.withSuperclass(CD_Object); > 173: clb.withFlags(AccessFlag.PUBLIC, AccessFlag.SUPER); > 174: clb.accept(InnerClassesAttribute.of( During the API discussions there was slightly more recommended to use `ClasfileBuilder::with` over `ClasfileBuilder::accept`, however it is just a cosmetic difference. ------------- PR: https://git.openjdk.org/jdk/pull/13009 From asotona at openjdk.org Thu Mar 16 14:22:19 2023 From: asotona at openjdk.org (Adam Sotona) Date: Thu, 16 Mar 2023 14:22:19 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2] In-Reply-To: References: Message-ID: <-jdVuSccJFgRr097u55X_v7lxZu2ZnQkZMSDZddnO_4=.abc0fb6c-ba84-49d4-bb43-2c0dd8ecf249@github.com> On Wed, 15 Mar 2023 04:09:04 GMT, Chen Liang wrote: >> Summaries: >> 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before >> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. >> 3. Most tests are included in tier1, but some are not: >> In `:jdk_io`: (tier2, part 2) >> >> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java >> test/jdk/java/io/Serializable/records/ProhibitedMethods.java >> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java >> >> In `:jdk_instrument`: (tier 3) >> >> test/jdk/java/lang/instrument/RetransformAgent.java >> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java >> test/jdk/java/lang/instrument/asmlib/Instrumentor.java >> >> >> @asotona Would you mind reviewing? > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Shorten lines, move from mask() to ACC_ constants, other misc improvements I like the easy way I can read the tests code now even I don't know them. They look great :) ------------- PR: https://git.openjdk.org/jdk/pull/13009 From alanb at openjdk.org Thu Mar 16 14:53:21 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 16 Mar 2023 14:53:21 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 07:53:25 GMT, Alan Bateman wrote: > This is class descriptor for ProviderFactory$1, not "Provider" so maybe rename this to providerFactory1 or something a bit clearer. The updated version looks good. I assume you'll do a pass over the updated tests to bump their copyright date as this is the first change in 2023 for many of these tests. ------------- PR: https://git.openjdk.org/jdk/pull/13009 From liach at openjdk.org Thu Mar 16 15:03:58 2023 From: liach at openjdk.org (Chen Liang) Date: Thu, 16 Mar 2023 15:03:58 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v3] In-Reply-To: References: Message-ID: <9JGfanhjEcfkxJkObHi7of2axEP2J0eGlaDK7-DHQtI=.d22d8ee6-5897-4947-bc71-7da98248034d@github.com> > Summaries: > 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Fix failed LambdaStackTrace test, use more convenient APIs - Merge branch 'master' of https://git.openjdk.java.net/jdk into invoke-test-classfile - Shorten lines, move from mask() to ACC_ constants, other misc improvements - Convert test/jdk/java ASM tests to classfile api ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13009/files - new: https://git.openjdk.org/jdk/pull/13009/files/c6536bf9..a50b94f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=01-02 Stats: 68505 lines in 534 files changed: 42498 ins; 18129 del; 7878 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009 From dnsimon at openjdk.org Thu Mar 16 15:13:32 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 16 Mar 2023 15:13:32 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > SA and JVMCI fixes Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/12855 From liach at openjdk.org Thu Mar 16 15:23:53 2023 From: liach at openjdk.org (Chen Liang) Date: Thu, 16 Mar 2023 15:23:53 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v4] In-Reply-To: References: Message-ID: > Summaries: > 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Chen Liang has updated the pull request incrementally with one additional commit since the last revision: formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13009/files - new: https://git.openjdk.org/jdk/pull/13009/files/a50b94f9..09bbe4d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009 From liach at openjdk.org Thu Mar 16 15:34:58 2023 From: liach at openjdk.org (Chen Liang) Date: Thu, 16 Mar 2023 15:34:58 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v5] In-Reply-To: References: Message-ID: > Summaries: > 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Fix LambdaStackTrace after running ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13009/files - new: https://git.openjdk.org/jdk/pull/13009/files/09bbe4d5..a728c9de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009 From liach at openjdk.org Thu Mar 16 15:34:59 2023 From: liach at openjdk.org (Chen Liang) Date: Thu, 16 Mar 2023 15:34:59 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v5] In-Reply-To: References: Message-ID: <-_EfAIuz6dKVBcqAq0pu6EnhIL5Qo_Qr7hXNdu4oJ2Q=.849415b7-db1e-444f-8026-7d7f3e5f48c4@github.com> On Thu, 16 Mar 2023 14:50:14 GMT, Alan Bateman wrote: >> test/jdk/java/util/ServiceLoader/BadProvidersTest.java line 216: >> >>> 214: clb.withSuperclass(CD_Object); >>> 215: clb.withFlags(AccessFlag.PUBLIC, AccessFlag.SUPER); >>> 216: var provider$1Desc = ClassDesc.of("p", "ProviderFactory$1"); >> >> This is class descriptor for ProviderFactory$1, not "Provider" so maybe rename this to providerFactory1 or something a bit clearer. > >> This is class descriptor for ProviderFactory$1, not "Provider" so maybe rename this to providerFactory1 or something a bit clearer. > > The updated version looks good. I assume you'll do a pass over the updated tests to bump their copyright date as this is the first change in 2023 for many of these tests. Yes, the copyright years are updated. Tested Serializable, instrument, and LambdaStackTrace as of "[Fix LambdaStackTrace after running](https://github.com/openjdk/jdk/pull/13009/commits/a728c9de1ff684bd30726eb8ea6e7a674cb5a140)" ------------- PR: https://git.openjdk.org/jdk/pull/13009 From rrich at openjdk.org Thu Mar 16 16:15:10 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 16:15:10 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2335: > 2333: > 2334: __ load_resolved_indy_entry(cache, index); > 2335: __ ldr(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); Should this load have acquire semantics? Like [here in template interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp#L239) and [here for the zero interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/share/oops/cpCache.inline.hpp#L33)? Call stack for zero interpreter is ConstantPoolCacheEntry::indices_ord() ConstantPoolCacheEntry::bytecode_1() ConstantPoolCacheEntry::is_resolved(enum Bytecodes::Code) BytecodeInterpreter::run(interpreterState) ------------- PR: https://git.openjdk.org/jdk/pull/12778 From jlu at openjdk.org Thu Mar 16 18:19:29 2023 From: jlu at openjdk.org (Justin Lu) Date: Thu, 16 Mar 2023 18:19:29 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2] In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. Justin Lu has updated the pull request incrementally with four additional commits since the last revision: - Bug6204853 should not be converted - Copyright year for CompileProperties - Redo translation for CS.properties - Spot convert CurrencySymbols.properties ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12726/files - new: https://git.openjdk.org/jdk/pull/12726/files/1e798f24..6d6bffe8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=00-01 Stats: 92 lines in 4 files changed: 0 ins; 0 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726 From jlu at openjdk.org Thu Mar 16 18:21:40 2023 From: jlu at openjdk.org (Justin Lu) Date: Thu, 16 Mar 2023 18:21:40 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Thu, 16 Mar 2023 18:19:29 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with four additional commits since the last revision: > > - Bug6204853 should not be converted > - Copyright year for CompileProperties > - Redo translation for CS.properties > - Spot convert CurrencySymbols.properties test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1: > 1: # Conversion did not work as expected, addressing right now. ------------- PR: https://git.openjdk.org/jdk/pull/12726 From jlu at openjdk.org Thu Mar 16 18:21:43 2023 From: jlu at openjdk.org (Justin Lu) Date: Thu, 16 Mar 2023 18:21:43 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Wed, 15 Mar 2023 20:19:51 GMT, Naoto Sato wrote: >> Justin Lu has updated the pull request incrementally with four additional commits since the last revision: >> >> - Bug6204853 should not be converted >> - Copyright year for CompileProperties >> - Redo translation for CS.properties >> - Spot convert CurrencySymbols.properties > > test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156: > >> 154: zh=\u00A4 >> 155: zh_CN=\uFFE5 >> 156: zh_HK=HK$ > > Why are they not encoded into UTF-8 native? Not sure, thank you for catching it. Working on it right now. ------------- PR: https://git.openjdk.org/jdk/pull/12726 From jlu at openjdk.org Thu Mar 16 18:21:46 2023 From: jlu at openjdk.org (Justin Lu) Date: Thu, 16 Mar 2023 18:21:46 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2] In-Reply-To: <1I9v8d2OiyLfQVCozGYVRhAi3AotqGuRUhsNj0VCsUk=.e673ca33-d24f-4aab-908e-a5c0bfa3bf7c@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <1I9v8d2OiyLfQVCozGYVRhAi3AotqGuRUhsNj0VCsUk=.e673ca33-d24f-4aab-908e-a5c0bfa3bf7c@github.com> Message-ID: <_6WBGo5CQBseDEjMv16qCWmodFlYOO4gsT9WbON7ddA=.f94339a4-8893-47e4-8bb1-f28a8807ad9d@github.com> On Wed, 15 Mar 2023 16:18:44 GMT, Archie L. Cobbs wrote: >> Justin Lu has updated the pull request incrementally with four additional commits since the last revision: >> >> - Bug6204853 should not be converted >> - Copyright year for CompileProperties >> - Redo translation for CS.properties >> - Spot convert CurrencySymbols.properties > > test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1: > >> 1: # > > This file should probably be excluded because it's used in a test that relates to UTF-8 encoding (or not) of property files. Thank you, removed the changes for this file ------------- PR: https://git.openjdk.org/jdk/pull/12726 From jlu at openjdk.org Thu Mar 16 18:35:51 2023 From: jlu at openjdk.org (Justin Lu) Date: Thu, 16 Mar 2023 18:35:51 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v3] In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. Justin Lu has updated the pull request incrementally with two additional commits since the last revision: - Reconvert CS.properties to UTF-8 - Revert all changes to CurrencySymbols.properties ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12726/files - new: https://git.openjdk.org/jdk/pull/12726/files/6d6bffe8..7119830b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=01-02 Stats: 87 lines in 1 file changed: 0 ins; 0 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726 From jlu at openjdk.org Thu Mar 16 18:35:54 2023 From: jlu at openjdk.org (Justin Lu) Date: Thu, 16 Mar 2023 18:35:54 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v3] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Thu, 16 Mar 2023 18:31:23 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with two additional commits since the last revision: > > - Reconvert CS.properties to UTF-8 > - Revert all changes to CurrencySymbols.properties test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1: > 1: # CurrencySymbols.properties is fully converted to UTF-8 now ------------- PR: https://git.openjdk.org/jdk/pull/12726 From ron.pressler at oracle.com Thu Mar 16 18:48:13 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Thu, 16 Mar 2023 18:48:13 +0000 Subject: Disallowing the dynamic loading of agents by default Message-ID: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Hi. In JDK 21 we intend to disallow the dynamic loading of agents by default. This will affect tools that use the Attach API to load an agent into a JVM some time after the JVM has started [1]. There is no change to any of the mechanisms that load an agent at JVM startup (-javaagent/-agentlib on the command line or the Launcher-Agent-Class attribute in the main JAR's manifest). This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. At that time the consensus was to switch to this default not in JDK 9 but in a later release to give tool maintainers sufficient time to inform their users. To allow the dynamic loading of agents, users will need to specify -XX:+EnableDynamicAgentLoading on the command line. I'll post a draft JEP for review shortly. -- Ron [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html [2]: https://openjdk.org/jeps/261 [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html From sspitsyn at openjdk.org Thu Mar 16 19:41:04 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Mar 2023 19:41:04 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> References: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> Message-ID: <8jh1y0hHWV5zGJ39PT8blO-DAbipR9Tes3y9txeZ_x0=.61a13d0f-6445-44b5-b996-be47ea82962e@github.com> On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > feedback Marked as reviewed by sspitsyn (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/13030 From dcubed at openjdk.org Thu Mar 16 19:51:10 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 19:51:10 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: <3B1OYwu_wroqQecRIZNNk7Yrrs_X_nk4hrjiC9IeGvk=.d3a9c5ac-7578-4d08-95a6-35b046d2cf26@github.com> On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Several changes (mostly cosmetic) in response to reviews I did a round of Mach5 Tier[1-4] testing on v26. Please see the bug report for the gory details. There are 12 tests in Tier4 that fail when -Xcheck:jni is used. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 16 19:55:18 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 19:55:18 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Several changes (mostly cosmetic) in response to reviews Another way to look at the option name question is to invert the sense of the option. The old stack-locking code would be enabled by this new `UseStackLocking` option (which would be on by default for now) and the newer locking code that uses a lock-stack that is embedded in the JavaThread would be the "else" case of the temporary `UseStackLocking` option. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Thu Mar 16 20:31:02 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 16 Mar 2023 20:31:02 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> References: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> Message-ID: On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > feedback Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/13030 From rkennke at openjdk.org Thu Mar 16 20:56:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 20:56:15 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: Message-ID: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Set condition flags correctly after fast-lock call on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/2445a19d..37f061b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=27-28 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 20:56:19 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 20:56:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Several changes (mostly cosmetic) in response to reviews In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 16 20:56:22 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 20:56:22 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Several changes (mostly cosmetic) in response to reviews src/hotspot/cpu/x86/x86_32.ad line 617: > 615: int bangsize = C->output()->bang_size_in_bytes(); > 616: > 617: __ verified_entry(framesize, C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), C->stub_function() != NULL); Why did this change from `nullptr` -> `NULL`? src/hotspot/cpu/x86/x86_64.ad line 925: > 923: } > 924: > 925: __ verified_entry(framesize, C->output()->need_stack_bang(bangsize)?bangsize:0, false, C->stub_function() != NULL); Why did this change from `nullptr` -> `NULL`? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 21:00:37 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 21:00:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 20:50:12 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Several changes (mostly cosmetic) in response to reviews > > src/hotspot/cpu/x86/x86_32.ad line 617: > >> 615: int bangsize = C->output()->bang_size_in_bytes(); >> 616: >> 617: __ verified_entry(framesize, C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), C->stub_function() != NULL); > > Why did this change from `nullptr` -> `NULL`? I reverted that part back to upstream state (at least what is in JDK-21+13). ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 16 21:09:18 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 21:09:18 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 20:47:59 GMT, Roman Kennke wrote: > I pushed a fix for aarch64. Do you think this is the cause for the -Xcheck:jni failures that I ran into in my Tier4 testing? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 16 21:09:20 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 21:09:20 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 20:57:31 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/x86_32.ad line 617: >> >>> 615: int bangsize = C->output()->bang_size_in_bytes(); >>> 616: >>> 617: __ verified_entry(framesize, C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), C->stub_function() != NULL); >> >> Why did this change from `nullptr` -> `NULL`? > > I reverted that part back to upstream state (at least what is in JDK-21+13). Okay. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Thu Mar 16 21:10:49 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 16 Mar 2023 21:10:49 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" Message-ID: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> The debuggee main method creates two threads and then starts them: public static void main(String[] args) { System.out.println("Howdy!"); Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); t1.start(); t2.start(); } These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. ------------- Commit messages: - Make sure main() debuggee method does not exit until test threads are complete. Changes: https://git.openjdk.org/jdk/pull/13068/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13068&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290200 Stats: 13 lines in 2 files changed: 6 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13068.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13068/head:pull/13068 PR: https://git.openjdk.org/jdk/pull/13068 From cjplummer at openjdk.org Thu Mar 16 21:10:52 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 16 Mar 2023 21:10:52 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" In-Reply-To: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> Message-ID: On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. test/jdk/com/sun/jdi/InvokeHangTest.java line 223: > 221: mainThread = bpe.thread(); > 222: EventRequestManager erm = vm().eventRequestManager(); > 223: final Thread mainTestThread = Thread.currentThread(); This local variable was conflicting with an instance field of the same name. See line 135. I ran into issues when I added some debugging code to this method that referenced the other `mainThread`, so I did a rename. ------------- PR: https://git.openjdk.org/jdk/pull/13068 From rkennke at openjdk.org Thu Mar 16 21:19:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 21:19:29 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 21:05:54 GMT, Daniel D. Daugherty wrote: > > I pushed a fix for aarch64. > > > > Do you think this is the cause for the -Xcheck:jni failures that I ran into > > in my Tier4 testing? Yes, and with high probability also for some/all of the other failures. It leads to the situation that when the lock-stack is full, it should take the slow-path, but doesn't (because the flags are not set correctly) and thus leave the object unlocked. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From amenkov at openjdk.org Thu Mar 16 21:28:32 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 16 Mar 2023 21:28:32 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" In-Reply-To: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> Message-ID: <6EyxOum8MbaEMT419PYhRFcmaHCi5_Sny7j1-57qQuo=.a34e20fd-d3b5-4949-a51c-c33a39693389@github.com> On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. Marked as reviewed by amenkov (Reviewer). test/jdk/com/sun/jdi/InvokeHangTest.java line 200: > 198: try { > 199: StackFrame sf = thread.frame(0); > 200: System.out.println(" Debugger: Breakpoint hit at "+sf.location()); while you are here please add spaces around plus ------------- PR: https://git.openjdk.org/jdk/pull/13068 From dcubed at openjdk.org Thu Mar 16 21:33:32 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 21:33:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: <-aGsX_dmmSBQPrgTVaCsZVU3gFQg1gs9gqS8RFzkRC4=.21ec8ca5-18c5-4f8a-a286-f0c1a32bdeee@github.com> On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 I've reviewed the v27 and v28 changes and kicked off yet another round of Mach5 testing. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From matsaave at openjdk.org Thu Mar 16 21:39:57 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 16 Mar 2023 21:39:57 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 16:11:57 GMT, Richard Reingruber wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed aarch64 interpreter mistake > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2335: > >> 2333: >> 2334: __ load_resolved_indy_entry(cache, index); >> 2335: __ ldr(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); > > Should this load have acquire semantics? > Like [here in template interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp#L239) and [here for the zero interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/share/oops/cpCache.inline.hpp#L33)? > > Call stack for zero interpreter is > > ConstantPoolCacheEntry::indices_ord() > ConstantPoolCacheEntry::bytecode_1() > ConstantPoolCacheEntry::is_resolved(enum Bytecodes::Code) > BytecodeInterpreter::run(interpreterState) Yes, acquire semantics should be used here. Thank you for pointing this out! ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dholmes at openjdk.org Thu Mar 16 22:38:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 22:38:39 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > SA and JVMCI fixes Nice piece of work Fred - I won't pretend to follow every detail. A few nits on unnecessary alignment (which may match pre-existing style not evident in the diff). Thanks. src/hotspot/share/oops/fieldInfo.inline.hpp line 170: > 168: new_flags = old_flags & ~mask; > 169: witness = Atomic::cmpxchg(&flags, old_flags, new_flags); > 170: } while(witness != old_flags); Just to prove I did read this :) space needed after `while` src/hotspot/share/oops/fieldInfo.inline.hpp line 174: > 172: > 173: inline void FieldStatus::update_flag(FieldStatusBitPosition pos, bool z) { > 174: if (z) atomic_set_bits( _flags, flag_mask(pos)); Nit: extra space before `_flags` src/hotspot/share/oops/fieldInfo.inline.hpp line 175: > 173: inline void FieldStatus::update_flag(FieldStatusBitPosition pos, bool z) { > 174: if (z) atomic_set_bits( _flags, flag_mask(pos)); > 175: else atomic_clear_bits(_flags, flag_mask(pos)); Nit: no need for the extra spaces. If you really want these to align just place them on ne wlines. src/hotspot/share/oops/instanceKlass.inline.hpp line 50: > 48: > 49: inline Symbol* InstanceKlass::field_name (int index) const { return field(index).name(constants()); } > 50: inline Symbol* InstanceKlass::field_signature (int index) const { return field(index).signature(constants()); } There should not be spaces between a method name and the opening `(`. I'm really not a fine of this kind of alignment. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From lmesnik at openjdk.org Thu Mar 16 23:25:19 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 16 Mar 2023 23:25:19 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" In-Reply-To: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> Message-ID: On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/13068 From lmesnik at openjdk.org Fri Mar 17 00:01:21 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 17 Mar 2023 00:01:21 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE Message-ID: The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. The simplest fix to avoid this is just to rename class. ------------- Commit messages: - renamed classes Changes: https://git.openjdk.org/jdk/pull/13069/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13069&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304376 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13069.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13069/head:pull/13069 PR: https://git.openjdk.org/jdk/pull/13069 From sspitsyn at openjdk.org Fri Mar 17 01:36:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Mar 2023 01:36:47 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v3] In-Reply-To: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> Message-ID: > This is needed for performance improvements in support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: address pre-review comments from Leonid ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13054/files - new: https://git.openjdk.org/jdk/pull/13054/files/397b6337..f3692263 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=01-02 Stats: 22 lines in 2 files changed: 14 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054 From stuefe at openjdk.org Fri Mar 17 06:18:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 17 Mar 2023 06:18:40 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 20:47:59 GMT, Roman Kennke wrote: > In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 17 06:36:37 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 17 Mar 2023 06:36:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> On Fri, 17 Mar 2023 06:15:28 GMT, Thomas Stuefe wrote: > > In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. > > > > I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null register. Tst is better at least on x86 because it encodes smaller. *shrugs* You can do it in the shared fast_lock() but it's really only needed in C2, that's why I'm doing it there. Maybe I'm too perfectionist when it comes to assembly code? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Fri Mar 17 06:44:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 17 Mar 2023 06:44:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> References: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> Message-ID: On Fri, 17 Mar 2023 06:33:43 GMT, Roman Kennke wrote: > > > > In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. > > > > > > I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. > > Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null register. Sure. I used cmp with an immediate that I knew was not the value. Clunky, I know. As I wrote, tst seems better. > Tst is better at least on x86 because it encodes smaller. _shrugs_ > > You can do it in the shared fast_lock() but it's really only needed in C2, that's why I'm doing it there. Maybe I'm too perfectionist when it comes to assembly code? I felt just better having it there, at least for the start. I may still move it outside to C2. Lets see. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Fri Mar 17 07:24:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Mar 2023 07:24:21 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 23:54:13 GMT, Leonid Mesnik wrote: > The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. > > The simplest fix to avoid this is just to rename class. The fix looks good in general - approved. Nit: But why don't use longer/better class names? ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/13069 From sspitsyn at openjdk.org Fri Mar 17 07:30:22 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Mar 2023 07:30:22 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" In-Reply-To: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> Message-ID: On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/13068 From aturbanov at openjdk.org Fri Mar 17 08:13:12 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 17 Mar 2023 08:13:12 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v5] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 15:34:58 GMT, Chen Liang wrote: >> Summaries: >> 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before >> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. >> 3. Most tests are included in tier1, but some are not: >> In `:jdk_io`: (tier2, part 2) >> >> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java >> test/jdk/java/io/Serializable/records/ProhibitedMethods.java >> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java >> >> In `:jdk_instrument`: (tier 3) >> >> test/jdk/java/lang/instrument/RetransformAgent.java >> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java >> test/jdk/java/lang/instrument/asmlib/Instrumentor.java >> >> >> @asotona Would you mind reviewing? > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Fix LambdaStackTrace after running test/jdk/java/lang/invoke/8022701/MHIllegalAccess.java line 50: > 48: public class MHIllegalAccess { > 49: > 50: public static void main(String[] args) throws Throwable { Suggestion: public static void main(String[] args) throws Throwable { ------------- PR: https://git.openjdk.org/jdk/pull/13009 From stuefe at openjdk.org Fri Mar 17 09:04:24 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 17 Mar 2023 09:04:24 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> References: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> Message-ID: On Fri, 17 Mar 2023 06:33:43 GMT, Roman Kennke wrote: >>> In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. >> >> I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. > >> > In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. >> >> >> >> I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. > > Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null register. Tst is better at least on x86 because it encodes smaller. *shrugs* > > You can do it in the shared fast_lock() but it's really only needed in C2, that's why I'm doing it there. Maybe I'm too perfectionist when it comes to assembly code? @rkennke I was not able to directly use 'JavaThread::lock_stack_offset_offset()' in cmp since it was not encodable as immediate. You did not hit the same problem on aarch64, right? IIUC that was more out of accident, since you should have similar or the same (not sure) restrictions for encoding immediates. But your Thread layout is probably different and the offset may just happened to be encodable. If so, that would make you vulnerable against changes in Thread that change the offset of the LockStack. Anyway, for now I solved this by using the second scratch register as intermediate. One more instruction though. I am now experimenting with my original idea of placing the Lockstack slots at a known aligned offset and then testing the alignment of the current index/pointer. This should be possible with a simple TST. Lets see how this goes. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Fri Mar 17 10:31:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Mar 2023 10:31:46 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> Message-ID: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> > This is needed for performance improvements in support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor tweaks in intrisics implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13054/files - new: https://git.openjdk.org/jdk/pull/13054/files/f3692263..8233f0ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054 From thomas.stuefe at gmail.com Fri Mar 17 13:33:31 2023 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 17 Mar 2023 14:33:31 +0100 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: Hi Ron, Will this affect attaching via jcmd? Thanks, Thomas On Thu, Mar 16, 2023 at 7:48?PM Ron Pressler wrote: > Hi. > > In JDK 21 we intend to disallow the dynamic loading of agents by default. > This > will affect tools that use the Attach API to load an agent into a JVM some > time > after the JVM has started [1]. There is no change to any of the mechanisms > that > load an agent at JVM startup (-javaagent/-agentlib on the command line or > the > Launcher-Agent-Class attribute in the main JAR's manifest). > > This change in default behavior was proposed in 2017 as part of JEP 261 > [2][3]. > At that time the consensus was to switch to this default not in JDK 9 but > in a > later release to give tool maintainers sufficient time to inform their > users. > To allow the dynamic loading of agents, users will need to specify > -XX:+EnableDynamicAgentLoading on the command line. > > I'll post a draft JEP for review shortly. > > -- Ron > > [1]: > https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html > [2]: https://openjdk.org/jeps/261 > [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron.pressler at oracle.com Fri Mar 17 13:42:03 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Fri, 17 Mar 2023 13:42:03 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: On 17 Mar 2023, at 13:33, Thomas St?fe > wrote: Hi Ron, Will this affect attaching via jcmd? The Attach mechanism will not be disabled by default, just the ability to load agents via the Attach mechanism. So the only jcmd command that will be affected is JVMTI.agent_load. To see the effect of the change today, launch java with -XX:-EnableDynamicAgentLoading, which is to become the new default. ? Ron -------------- next part -------------- An HTML attachment was scrubbed... URL: From fparain at openjdk.org Fri Mar 17 13:51:05 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 17 Mar 2023 13:51:05 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v6] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: Style fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/f81337f7..ab57b03a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=04-05 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From thomas.stuefe at gmail.com Fri Mar 17 14:11:17 2023 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 17 Mar 2023 15:11:17 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: Thank you for the clarification. Oddly enough, -XX:-EnableDynamicAgentLoading seems to be broken. Tried head (fastdebug, release) and JDK17, even with this switch my sample library loads just fine: ``` thomas at starfish$ ./images/jdk/bin/java -XX:-EnableDynamicAgentLoading -XX:+PrintFlagsFinal -cp $REPROS_JAR de.stuefe.repros.Simple [Global flags] ... bool EnableDynamicAgentLoading = false {product} {command line} ... OnAttach! Loading JVMTI sample agent ``` Investigation shows that there seems to be a bug in attachListener.cpp where we compare AttachOperation::name for "load", but it contains "jcmd": ``` Thread 22 "Attach Listener" hit Breakpoint 1, attach_listener_thread_entry (thread=0x7fff94000fd0, __the_thread__=0x7fff94000fd0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/attachListener.cpp:404 404 } else if (!EnableDynamicAgentLoading && strcmp(op->name(), "load") == 0) { (gdb) p op $1 = (AttachOperation *) 0x7fff7401b640 (gdb) p *op $2 = {> = {}, _vptr.AttachOperation = 0x7ffff7b61210 , _name = "jcmd\000", '\361' , , _arg = { "JVMTI.agent_load /shared/projects/jvmti-sample/sample.so\000", '\361' ..., "\000", '\361' ..., "\000", '\361' ...}} (gdb) p op->name() $3 = 0x7fff7401b648 "jcmd" ``` This was on Linux x64. So if people have been using -XX:-EnableDynamicAgentLoading to check their code, this may not have worked as intended. Cheers, Thomas On Fri, Mar 17, 2023 at 2:42?PM Ron Pressler wrote: > > > On 17 Mar 2023, at 13:33, Thomas St?fe wrote: > > Hi Ron, > > Will this affect attaching via jcmd? > > > The Attach mechanism will not be disabled by default, just the ability to > load agents via the Attach mechanism. > So the only jcmd command that will be affected is JVMTI.agent_load. > > To see the effect of the change today, launch java with > -XX:-EnableDynamicAgentLoading, which is > to become the new default. > > ? Ron > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron.pressler at oracle.com Fri Mar 17 14:29:56 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Fri, 17 Mar 2023 14:29:56 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: <8639867E-0D0A-402D-8A3B-41BB12A8F0BF@oracle.com> > On 17 Mar 2023, at 14:11, Thomas St?fe wrote: > > Thank you for the clarification. > > Oddly enough, -XX:-EnableDynamicAgentLoading seems to be broken. Tried head (fastdebug, release) and JDK17, even with this switch my sample library loads just fine: > > ``` > thomas at starfish$ ./images/jdk/bin/java -XX:-EnableDynamicAgentLoading -XX:+PrintFlagsFinal -cp $REPROS_JAR de.stuefe.repros.Simple > [Global flags] > ... > bool EnableDynamicAgentLoading = false {product} {command line} > ... > > OnAttach! Loading JVMTI sample agent > ``` > > Investigation shows that there seems to be a bug in attachListener.cpp where we compare AttachOperation::name for "load", but it contains "jcmd": > > ``` > Thread 22 "Attach Listener" hit Breakpoint 1, attach_listener_thread_entry (thread=0x7fff94000fd0, __the_thread__=0x7fff94000fd0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/attachListener.cpp:404 > 404 } else if (!EnableDynamicAgentLoading && strcmp(op->name(), "load") == 0) { > (gdb) p op > $1 = (AttachOperation *) 0x7fff7401b640 > (gdb) p *op > $2 = {> = {}, _vptr.AttachOperation = 0x7ffff7b61210 , _name = "jcmd\000", '\361' , , _arg = { > "JVMTI.agent_load /shared/projects/jvmti-sample/sample.so\000", '\361' ..., "\000", '\361' ..., "\000", '\361' ...}} > (gdb) p op->name() > $3 = 0x7fff7401b648 "jcmd" > ``` > > This was on Linux x64. > > So if people have been using -XX:-EnableDynamicAgentLoading to check their code, this may not have worked as intended. > > Cheers, Thomas There may be a missing check in JVMTIAgentLoadDCmd::execute in diagnosticCommand.cpp. Thank you for reporting this! ? Ron From aturbanov at openjdk.org Fri Mar 17 14:54:30 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 17 Mar 2023 14:54:30 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v6] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 13:51:05 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Style fixes src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 268: > 266: > 267: Field getField(int index) { > 268: synchronized(this) { nit Suggestion: synchronized (this) { src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/tools/jcore/ClassWriter.java line 380: > 378: dos.writeShort(accessFlags & (short) JVM_RECOGNIZED_FIELD_MODIFIERS); > 379: > 380: int nameIndex = klass.getFieldNameIndex(index); nit: Suggestion: int nameIndex = klass.getFieldNameIndex(index); ------------- PR: https://git.openjdk.org/jdk/pull/12855 From liach at openjdk.org Fri Mar 17 14:57:35 2023 From: liach at openjdk.org (Chen Liang) Date: Fri, 17 Mar 2023 14:57:35 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v6] In-Reply-To: References: Message-ID: > Summaries: > 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Update test/jdk/java/lang/invoke/8022701/MHIllegalAccess.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13009/files - new: https://git.openjdk.org/jdk/pull/13009/files/a728c9de..271cb98d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009 From aturbanov at openjdk.org Fri Mar 17 14:59:43 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 17 Mar 2023 14:59:43 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ResolvedIndyEntry.java line 49: > 47: > 48: private static synchronized void initialize(TypeDataBase db) throws WrongTypeException { > 49: Type type = db.lookupType("ResolvedIndyEntry"); Suggestion: Type type = db.lookupType("ResolvedIndyEntry"); ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dcubed at openjdk.org Fri Mar 17 15:18:02 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 17 Mar 2023 15:18:02 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" In-Reply-To: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> Message-ID: On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. Marked as reviewed by dcubed (Reviewer). test/jdk/com/sun/jdi/InvokeHangTest.java line 67: > 65: } catch (InterruptedException e) { > 66: throw new RuntimeException(e); > 67: } Please consider adding a comment that explains that the `join()` calls are only necessary when `t1` and `t2` are virtual threads. ------------- PR: https://git.openjdk.org/jdk/pull/13068 From fparain at openjdk.org Fri Mar 17 15:58:35 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 17 Mar 2023 15:58:35 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v7] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Style fixes - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 - Style fixes - SA and JVMCI fixes - Fixes includes and style - SA additional caching from Chris Plummer - Addressing comments from first reviews - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 - Reimplementation of FieldInfo as an unsigned5 stream ------------- Changes: https://git.openjdk.org/jdk/pull/12855/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=06 Stats: 1790 lines in 54 files changed: 927 ins; 483 del; 380 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From cjplummer at openjdk.org Fri Mar 17 17:21:55 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 17 Mar 2023 17:21:55 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 07:21:41 GMT, Serguei Spitsyn wrote: > Nit: But why don't use longer/better class names? You also might want to make it start with an uppercase letter. ------------- PR: https://git.openjdk.org/jdk/pull/13069 From vlivanov at openjdk.org Fri Mar 17 17:36:09 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 17 Mar 2023 17:36:09 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for performance improvements in support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` >> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` >> 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: >> `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` >> `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` >> `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` >> `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` >> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. >> 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. >> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. >> >> Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. >> >> Testing: >> - Ran mach5 tiers 1-6. No regressions were found. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweaks in intrisics implementation Overall, compiler changes look good. Any performance numbers to justify the intrinsification? ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/13054 From amenkov at openjdk.org Fri Mar 17 18:26:55 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 17 Mar 2023 18:26:55 GMT Subject: Integrated: 8303921: serviceability/sa/UniqueVtableTest.java timed out In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:05:44 GMT, Alex Menkov wrote: > The change: > - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; > - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. > > Tested: 100 runs on all platforms, no failures This pull request has now been integrated. Changeset: 02a4ee20 Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/02a4ee206a979858c23c22da35e70560e0f27efd Stats: 77 lines in 6 files changed: 39 ins; 22 del; 16 mod 8303921: serviceability/sa/UniqueVtableTest.java timed out Reviewed-by: cjplummer, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/13030 From sspitsyn at openjdk.org Fri Mar 17 18:36:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Mar 2023 18:36:18 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: On Fri, 17 Mar 2023 17:33:46 GMT, Vladimir Ivanov wrote: > Overall, compiler changes look good. > Any performance numbers to justify the intrinsification? Thank you for review and your guidance and help with C2 intrinsification! My goal was to move the notifyJvmtiEvents checks from Java to VM side without a performance penalty. I do not observe any performance degradation with customized Skynet benchmark executing 5 million virtual threads. Used `time` utility to measure total execution time (in milliseconds) of 10 runs on Oracle Linux server: - without intrinsics: 6083, 5405, 5270, 5700, 5004, 5402, 5536, 5031, 4902, 5124 - with intrinsics: 5904, 5287, 5470, 5672, 5298, 5053, 6154, 4992, 6237, 5155 ------------- PR: https://git.openjdk.org/jdk/pull/13054 From matsaave at openjdk.org Fri Mar 17 19:53:28 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 17 Mar 2023 19:53:28 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v7] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Acquire semantics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/9a3a63ae..3dc112b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=05-06 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From fparain at openjdk.org Fri Mar 17 20:19:39 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 17 Mar 2023 20:19:39 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v7] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:58:35 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Style fixes > - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 > - Style fixes > - SA and JVMCI fixes > - Fixes includes and style > - SA additional caching from Chris Plummer > - Addressing comments from first reviews > - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 > - Reimplementation of FieldInfo as an unsigned5 stream Chris, Doug, thank you for your reviews and your help. Coleen, David, Andrey, thank you for your reviews. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Fri Mar 17 20:22:48 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 17 Mar 2023 20:22:48 GMT Subject: Integrated: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: <1g-MNikg2bzX02U4IDcsKO4nGqPUWZI-77gTMrmQtlA=.5a102edd-69f2-4cd8-9738-fe5075f02f2c@github.com> On Fri, 3 Mar 2023 14:50:34 GMT, Frederic Parain wrote: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. This pull request has now been integrated. Changeset: bfb812a8 Author: Frederic Parain URL: https://git.openjdk.org/jdk/commit/bfb812a8ff8bca70aed7695c73f019ae66ac6f33 Stats: 1790 lines in 54 files changed: 927 ins; 483 del; 380 mod 8292818: replace 96-bit representation for field metadata with variable-sized streams Co-authored-by: John R Rose Co-authored-by: Chris Plummer Reviewed-by: dholmes, coleenp, cjplummer, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/12855 From jlu at openjdk.org Fri Mar 17 20:28:13 2023 From: jlu at openjdk.org (Justin Lu) Date: Fri, 17 Mar 2023 20:28:13 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4] In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Adjust CF test to read in with UTF-8 to fix failing test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12726/files - new: https://git.openjdk.org/jdk/pull/12726/files/7119830b..007c78a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=02-03 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726 From angorya at openjdk.org Fri Mar 17 20:34:00 2023 From: angorya at openjdk.org (Andy Goryachev) Date: Fri, 17 Mar 2023 20:34:00 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: <-3wtWK_Pdt1fqDnSjbS6JTGLwboJi7Tw2sV0v7LQ3Os=.7036d0b0-2524-43bc-a82d-640f29fd35a0@github.com> On Fri, 17 Mar 2023 20:28:13 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Adjust CF test to read in with UTF-8 to fix failing test make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java line 226: > 224: Properties p = new Properties(); > 225: try { > 226: FileInputStream input = new FileInputStream(propertiesPath); Should this stream be closed in a finally { } block? ------------- PR: https://git.openjdk.org/jdk/pull/12726 From naoto at openjdk.org Fri Mar 17 21:05:18 2023 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 17 Mar 2023 21:05:18 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4] In-Reply-To: <-3wtWK_Pdt1fqDnSjbS6JTGLwboJi7Tw2sV0v7LQ3Os=.7036d0b0-2524-43bc-a82d-640f29fd35a0@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <-3wtWK_Pdt1fqDnSjbS6JTGLwboJi7Tw2sV0v7LQ3Os=.7036d0b0-2524-43bc-a82d-640f29fd35a0@github.com> Message-ID: On Fri, 17 Mar 2023 20:31:27 GMT, Andy Goryachev wrote: >> Justin Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust CF test to read in with UTF-8 to fix failing test > > make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java line 226: > >> 224: Properties p = new Properties(); >> 225: try { >> 226: FileInputStream input = new FileInputStream(propertiesPath); > > Should this stream be closed in a finally { } block? or better be `try-with-resources`? ------------- PR: https://git.openjdk.org/jdk/pull/12726 From weijun at openjdk.org Fri Mar 17 21:52:23 2023 From: weijun at openjdk.org (Weijun Wang) Date: Fri, 17 Mar 2023 21:52:23 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: <1JBZe7nrM-HsVEItfK-3GPeXoX_glyM9SL4ZACUbLwk=.3a3cf62b-0960-4b03-80aa-2756bd1636dc@github.com> On Fri, 17 Mar 2023 20:28:13 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Adjust CF test to read in with UTF-8 to fix failing test make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java line 326: > 324: outBuffer.append(toHex((aChar >> 8) & 0xF)); > 325: outBuffer.append(toHex((aChar >> 4) & 0xF)); > 326: outBuffer.append(toHex(aChar & 0xF)); Sorry I don't know when this tool is called, but why is it still writing in `\unnnn` style? ------------- PR: https://git.openjdk.org/jdk/pull/12726 From weijun at openjdk.org Fri Mar 17 21:56:23 2023 From: weijun at openjdk.org (Weijun Wang) Date: Fri, 17 Mar 2023 21:56:23 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v4] In-Reply-To: <1JBZe7nrM-HsVEItfK-3GPeXoX_glyM9SL4ZACUbLwk=.3a3cf62b-0960-4b03-80aa-2756bd1636dc@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <1JBZe7nrM-HsVEItfK-3GPeXoX_glyM9SL4ZACUbLwk=.3a3cf62b-0960-4b03-80aa-2756bd1636dc@github.com> Message-ID: On Fri, 17 Mar 2023 21:49:33 GMT, Weijun Wang wrote: >> Justin Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust CF test to read in with UTF-8 to fix failing test > > make/jdk/src/classes/build/tools/compileproperties/CompileProperties.java line 326: > >> 324: outBuffer.append(toHex((aChar >> 8) & 0xF)); >> 325: outBuffer.append(toHex((aChar >> 4) & 0xF)); >> 326: outBuffer.append(toHex(aChar & 0xF)); > > Sorry I don't know when this tool is called, but why is it still writing in `\unnnn` style? I probably understand it now, source code still needs escaping. When can we put in UTF-8 there as well? ------------- PR: https://git.openjdk.org/jdk/pull/12726 From matsaave at openjdk.org Fri Mar 17 22:08:23 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 17 Mar 2023 22:08:23 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v8] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed aarch64 and added load-acquire for resolution check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/3dc112b2..6600e6dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=06-07 Stats: 7 lines in 2 files changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From mchung at openjdk.org Fri Mar 17 22:08:56 2023 From: mchung at openjdk.org (Mandy Chung) Date: Fri, 17 Mar 2023 22:08:56 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library Message-ID: `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. ------------- Commit messages: - JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library Changes: https://git.openjdk.org/jdk/pull/13085/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13085&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304163 Stats: 146 lines in 17 files changed: 87 ins; 15 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/13085.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13085/head:pull/13085 PR: https://git.openjdk.org/jdk/pull/13085 From jlu at openjdk.org Fri Mar 17 22:27:48 2023 From: jlu at openjdk.org (Justin Lu) Date: Fri, 17 Mar 2023 22:27:48 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v5] In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Close streams when finished loading into props ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12726/files - new: https://git.openjdk.org/jdk/pull/12726/files/007c78a7..19b91e6b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=03-04 Stats: 15 lines in 3 files changed: 6 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726 From cjplummer at openjdk.org Sat Mar 18 00:22:54 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 18 Mar 2023 00:22:54 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" [v2] In-Reply-To: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> Message-ID: <3YdaSsXTYFUWdJC0nuqC2QucAbr-BM5zHniKB6r1aOM=.703bf19d-1ba9-450b-95b7-e69684f2ee59@github.com> > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. Chris Plummer has updated the pull request incrementally with three additional commits since the last revision: - fix spelling error - add comment - minor formatting fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13068/files - new: https://git.openjdk.org/jdk/pull/13068/files/4d28b92e..58ca27b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13068&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13068&range=00-01 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13068.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13068/head:pull/13068 PR: https://git.openjdk.org/jdk/pull/13068 From cjplummer at openjdk.org Sat Mar 18 00:27:34 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 18 Mar 2023 00:27:34 GMT Subject: Integrated: 8297638: Memory leak in case of many started-dead threads In-Reply-To: <0R9vbkkJyHw0-SJ1gXbFFxkkK33Tv7ZGg9rMFCyl5rY=.12890e1c-03e0-4aba-adbf-19ab174aa384@github.com> References: <0R9vbkkJyHw0-SJ1gXbFFxkkK33Tv7ZGg9rMFCyl5rY=.12890e1c-03e0-4aba-adbf-19ab174aa384@github.com> Message-ID: <7-wTMZQUL7BH1VjXRBj9j6wNdC7LMTiaovDdeYWX0ro=.71851330-ddcf-4896-8d59-7f0759a7703d@github.com> On Wed, 18 Jan 2023 20:06:33 GMT, Chris Plummer wrote: > Fix JDI leak when the debuggee creates a lot of threads, while at the same the debugger is not sending any commands. The lack of commands being sent results in code not being triggered that normally would clear out unreachable listeners and also clear out ObjectReferences queued for disposal. This pull request has now been integrated. Changeset: f8482c20 Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/f8482c20f4f55d4fc5b304a33c87775b5acfe2b8 Stats: 198 lines in 3 files changed: 197 ins; 0 del; 1 mod 8297638: Memory leak in case of many started-dead threads Reviewed-by: amenkov, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12081 From jpai at openjdk.org Sat Mar 18 00:34:20 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Sat, 18 Mar 2023 00:34:20 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 22:01:46 GMT, Mandy Chung wrote: > `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. The changes look good to me ------------- Marked as reviewed by jpai (Reviewer). PR: https://git.openjdk.org/jdk/pull/13085 From dcubed at openjdk.org Sat Mar 18 00:42:24 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 18 Mar 2023 00:42:24 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" [v2] In-Reply-To: <3YdaSsXTYFUWdJC0nuqC2QucAbr-BM5zHniKB6r1aOM=.703bf19d-1ba9-450b-95b7-e69684f2ee59@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> <3YdaSsXTYFUWdJC0nuqC2QucAbr-BM5zHniKB6r1aOM=.703bf19d-1ba9-450b-95b7-e69684f2ee59@github.com> Message-ID: On Sat, 18 Mar 2023 00:22:54 GMT, Chris Plummer wrote: >> The debuggee main method creates two threads and then starts them: >> >> >> public static void main(String[] args) { >> System.out.println("Howdy!"); >> Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); >> Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); >> >> t1.start(); >> t2.start(); >> } >> >> >> These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. >> >> When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. > > Chris Plummer has updated the pull request incrementally with three additional commits since the last revision: > > - fix spelling error > - add comment > - minor formatting fix Thumbs up. Thanks for adding the comment. One possible nit typo... test/jdk/com/sun/jdi/InvokeHangTest.java line 65: > 63: try { > 64: // The join ensures that the test completes before we exit main(). If we are using > 65: // virtual threads, they are always daemon threads, and therefor the JVM will exit nit typo: s/therefor/therefore/ ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/13068 From cjplummer at openjdk.org Sat Mar 18 01:05:03 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 18 Mar 2023 01:05:03 GMT Subject: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" [v3] In-Reply-To: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> Message-ID: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: fix spelling error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13068/files - new: https://git.openjdk.org/jdk/pull/13068/files/58ca27b8..87b32dc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13068&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13068&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13068.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13068/head:pull/13068 PR: https://git.openjdk.org/jdk/pull/13068 From cjplummer at openjdk.org Sat Mar 18 04:48:13 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 18 Mar 2023 04:48:13 GMT Subject: RFR: 8304437: ProblemList com/sun/jdi/ThreadMemoryLeadTest.java with ZGC Message-ID: This new test is failing with ZGC, I believe on every run. Needs to be problem listed. ------------- Commit messages: - ProblemList ThreadMemoryLeakTarg.java Changes: https://git.openjdk.org/jdk/pull/13087/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13087&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304437 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13087.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13087/head:pull/13087 PR: https://git.openjdk.org/jdk/pull/13087 From Alan.Bateman at oracle.com Sat Mar 18 07:21:44 2023 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Sat, 18 Mar 2023 07:21:44 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: On 17/03/2023 14:11, Thomas St?fe wrote: > : > > Investigation shows that there seems to be a bug in attachListener.cpp > where we compare AttachOperation::name for "load", but it contains > "jcmd": When using the Attach API, the VirtualMachine.loadAgentXXX methods map to a "load" command. The Attach API, jstack, jmap ... pre-date the jcmd tool and have their own set of commands known to both the tool/client side and the VM side.? The "jcmd" command, used by the jcmd tool, works a bit like a HTTP upgrade. So different path, and you are right, jcmd JVMTI.agent_load was missed when EnableDynamicAgentLoading was added in JDK 9. I've created JDK-8304438 to track it. -Alan From alanb at openjdk.org Sat Mar 18 10:59:19 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 18 Mar 2023 10:59:19 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 22:01:46 GMT, Mandy Chung wrote: > `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. Thanks for moving this to the test lib. On the tag ordering, the reason some of these tests had @modules before @library was just to recommend ordering in the tag tag spec (https://openjdk.org/jtreg/tag-spec.html#ORDER). ------------- Marked as reviewed by alanb (Reviewer). PR: https://git.openjdk.org/jdk/pull/13085 From alanb at openjdk.org Sat Mar 18 11:27:20 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 18 Mar 2023 11:27:20 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: <-OJbhkKU3EtSS8E31eEd62h3-x5Szpl_Hk0apm1a6aQ=.687c660f-bc13-41cd-bc63-c59ca60300f0@github.com> On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for future performance/scalability improvements in JVMTI support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` >> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` >> 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: >> `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` >> `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` >> `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` >> `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` >> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. >> 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. >> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. >> >> Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. >> >> Testing: >> - Ran mach5 tiers 1-6. No regressions were found. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweaks in intrisics implementation The most important case is when there is no JVMTI env. If I read the changes correctly, the overhead for park/continue changes from one volatile-read (notifyJvmtiEvents) to two plain-writes (JavaThread::_is_in_VTMS_transition). If a JVMTI env has been created then there is no benefit for the moment as there is still a call into the runtime to interact with JvmtiVTMSTransitionDisabler. So I think you are saying that is for follow-on PRs. ------------- PR: https://git.openjdk.org/jdk/pull/13054 From jpai at openjdk.org Sat Mar 18 13:04:19 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Sat, 18 Mar 2023 13:04:19 GMT Subject: RFR: 8304437: ProblemList com/sun/jdi/ThreadMemoryLeadTest.java with ZGC In-Reply-To: References: Message-ID: On Sat, 18 Mar 2023 04:41:37 GMT, Chris Plummer wrote: > This new test is failing with ZGC, I believe on every run. Needs to be problem listed. The problem listed entry looks fine to me. ------------- Marked as reviewed by jpai (Reviewer). PR: https://git.openjdk.org/jdk/pull/13087 From dcubed at openjdk.org Sat Mar 18 14:36:19 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 18 Mar 2023 14:36:19 GMT Subject: RFR: 8304437: ProblemList com/sun/jdi/ThreadMemoryLeadTest.java with ZGC In-Reply-To: References: Message-ID: On Sat, 18 Mar 2023 04:41:37 GMT, Chris Plummer wrote: > This new test is failing with ZGC, I believe on every run. Needs to be problem listed. Thumbs up. This is a trivial fix. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/13087 From cjplummer at openjdk.org Sat Mar 18 17:11:39 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 18 Mar 2023 17:11:39 GMT Subject: Integrated: 8304437: ProblemList com/sun/jdi/ThreadMemoryLeadTest.java with ZGC In-Reply-To: References: Message-ID: On Sat, 18 Mar 2023 04:41:37 GMT, Chris Plummer wrote: > This new test is failing with ZGC, I believe on every run. Needs to be problem listed. This pull request has now been integrated. Changeset: 033c0b17 Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/033c0b17cbbf830ec28495761016d147902e4c42 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8304437: ProblemList com/sun/jdi/ThreadMemoryLeadTest.java with ZGC Reviewed-by: jpai, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/13087 From mchung at openjdk.org Sat Mar 18 19:14:09 2023 From: mchung at openjdk.org (Mandy Chung) Date: Sat, 18 Mar 2023 19:14:09 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library [v2] In-Reply-To: References: Message-ID: > `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: move @library after @modules per the recommended ordering ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13085/files - new: https://git.openjdk.org/jdk/pull/13085/files/3eda19b5..6b7611ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13085&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13085&range=00-01 Stats: 30 lines in 14 files changed: 15 ins; 15 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13085.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13085/head:pull/13085 PR: https://git.openjdk.org/jdk/pull/13085 From mchung at openjdk.org Sat Mar 18 19:14:11 2023 From: mchung at openjdk.org (Mandy Chung) Date: Sat, 18 Mar 2023 19:14:11 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 22:01:46 GMT, Mandy Chung wrote: > `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. Thanks for the pointer to the recommended ordering. Tests updated. ------------- PR: https://git.openjdk.org/jdk/pull/13085 From lmesnik at openjdk.org Sat Mar 18 19:54:51 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 18 Mar 2023 19:54:51 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE [v2] In-Reply-To: References: Message-ID: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> > The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. > > The simplest fix to avoid this is just to rename class. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: updated name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13069/files - new: https://git.openjdk.org/jdk/pull/13069/files/9c6e8b89..6f455341 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13069&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13069&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13069.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13069/head:pull/13069 PR: https://git.openjdk.org/jdk/pull/13069 From lmesnik at openjdk.org Sat Mar 18 20:03:20 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 18 Mar 2023 20:03:20 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE [v2] In-Reply-To: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> References: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> Message-ID: On Sat, 18 Mar 2023 19:54:51 GMT, Leonid Mesnik wrote: >> The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. >> >> The simplest fix to avoid this is just to rename class. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > updated name Not sure there is any meaningful name for these classes. So I updated it to CLEClass1/2 so it uses CamelCase notation and has test name as prefix to be unique. ------------- PR: https://git.openjdk.org/jdk/pull/13069 From suenaga at oss.nttdata.com Sun Mar 19 02:51:40 2023 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Sun, 19 Mar 2023 11:51:40 +0900 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: <51bd4634-eae5-3471-525e-3af71229c53c@oss.nttdata.com> HI, I haven't followed this topic, but I think dynamic loading mechanism of JVMTI agent is useful for debugging. Can we change flag type of EnableDynamicAgentLoading to `manageable` from `product`? If so, we can use JVMTI agent without rebooting system when we encountered some troubles in production system. Thanks, Yasumasa On 2023/03/17 3:48, Ron Pressler wrote: > Hi. > > In JDK 21 we intend to disallow the dynamic loading of agents by default. This > will affect tools that use the Attach API to load an agent into a JVM some time > after the JVM has started [1]. There is no change to any of the mechanisms that > load an agent at JVM startup (-javaagent/-agentlib on the command line or the > Launcher-Agent-Class attribute in the main JAR's manifest). > > This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. > At that time the consensus was to switch to this default not in JDK 9 but in a > later release to give tool maintainers sufficient time to inform their users. > To allow the dynamic loading of agents, users will need to specify > -XX:+EnableDynamicAgentLoading on the command line. > > I'll post a draft JEP for review shortly. > > -- Ron > > [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html > [2]: https://openjdk.org/jeps/261 > [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html From alanb at openjdk.org Sun Mar 19 07:05:19 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 19 Mar 2023 07:05:19 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library [v2] In-Reply-To: References: Message-ID: On Sat, 18 Mar 2023 19:14:09 GMT, Mandy Chung wrote: >> `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. > > Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: > > move @library after @modules per the recommended ordering Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/13085 From Alan.Bateman at oracle.com Sun Mar 19 09:27:02 2023 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Sun, 19 Mar 2023 09:27:02 +0000 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: <51bd4634-eae5-3471-525e-3af71229c53c@oss.nttdata.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <51bd4634-eae5-3471-525e-3af71229c53c@oss.nttdata.com> Message-ID: On 19/03/2023 02:51, Yasumasa Suenaga wrote: > : > > Can we change flag type of EnableDynamicAgentLoading to `manageable` > from `product`? If so, we can use JVMTI agent without rebooting system > when we encountered some troubles in production system. If manageable then it could be enabled at run-time with HotSpotDiagnosticMXBean.setVMOption (or jcmd VM.set_flag), so I think wouldn't change anything. The main issue with JVMTI agents loaded into a running VM is that they can do anything. Even if their capabilities were reduced (and many debugging capabilities are only available in the onload phase) it can still use JNI and bypass access control. So I think a difficult security vs. serviceability trade-off here. -Alan. From kirk.pepperdine at gmail.com Sun Mar 19 16:00:00 2023 From: kirk.pepperdine at gmail.com (Kirk Pepperdine) Date: Sun, 19 Mar 2023 09:00:00 -0700 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <51bd4634-eae5-3471-525e-3af71229c53c@oss.nttdata.com> Message-ID: I need to retrace this thread to gain more context but my initial thoughts were to all of the tools and techniques that I use and how vulnerable they are to this change vs. what the motivation is for this change. My initial assessment is that this is going to heavily impact visibility and wipe out the use of so many tools making is so much more difficult than it already is. Especially is you prescribe to a top-down methodical targeted approach to trouble-shooting rather than a shotgun gather everything you can methodology. The former often requires re-instrumentation on the fly. Shutting down to restart when some problems may take a couple of weeks to show really isn?t a great option. I guess you could just turn things back on but then I?d likely recommend that as an across the board setting. Again, I need to dig about to get more context. Kind regards, Kirk > On Mar 19, 2023, at 2:27 AM, Alan Bateman wrote: > > On 19/03/2023 02:51, Yasumasa Suenaga wrote: >> : >> >> Can we change flag type of EnableDynamicAgentLoading to `manageable` from `product`? If so, we can use JVMTI agent without rebooting system when we encountered some troubles in production system. > > If manageable then it could be enabled at run-time with HotSpotDiagnosticMXBean.setVMOption (or jcmd VM.set_flag), so I think wouldn't change anything. The main issue with JVMTI agents loaded into a running VM is that they can do anything. Even if their capabilities were reduced (and many debugging capabilities are only available in the onload phase) it can still use JNI and bypass access control. So I think a difficult security vs. serviceability trade-off here. > > -Alan. From lmesnik at openjdk.org Sun Mar 19 16:52:20 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 19 Mar 2023 16:52:20 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for future performance/scalability improvements in JVMTI support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` >> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` >> 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: >> `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` >> `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` >> `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` >> `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` >> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. >> 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. >> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. >> >> Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. >> >> Testing: >> - Ran mach5 tiers 1-6. No regressions were found. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweaks in intrisics implementation I haven't reviewed C2 changes, all other changes look good to me. ------------- Marked as reviewed by lmesnik (Reviewer). PR: https://git.openjdk.org/jdk/pull/13054 From andrei.pangin at gmail.com Mon Mar 20 01:21:28 2023 From: andrei.pangin at gmail.com (Andrei Pangin) Date: Mon, 20 Mar 2023 01:21:28 +0000 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: Hi all, Serviceability has been one of the biggest Java strengths, but the proposed change is going to have a large negative impact on it. Disallowing dynamic agents by default means it will no longer be possible to attach a profiler to a running app in runtime. JFR cannot close this gap due to lack of capabilities modern Java profilers have (that's a separate topic though). When an issue happens with a live app, it's already too late to add a command line argument. Furthermore, it may not be even feasible to add an agent at startup in containerized applications. Starting profiler on demand from the host OS or from a sidecar is the only viable solution in these cases. Next, it's hard to predict beforehand what tools exactly might be useful for troubleshooting: e.g., one tool may be better for finding memory leaks, a different one for analyzing CPU performance. Adding all possible tools at startup does not seem a reasonable approach, especially when tools may conflict with each other. The most important aspect of dynamic agents is the possibility to make a special tool just in time for solving a particular problem. A typical example is to get a value of some field in a live app without dumping the entire 60 GB heap. Another common use case is hot patching for fixing trivial bugs or for adding debug logs dynamically. The prominent example is when the dynamic agent has proved irreplaceable aid in addressing the notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046. I would be grateful to know more about the reasons why we should give up all the above advantages of dynamic agents in their good and legitimate use cases. Thank you, Andrei ??, 16 ???. 2023??. ? 18:48, Ron Pressler : > Hi. > > In JDK 21 we intend to disallow the dynamic loading of agents by default. > This > will affect tools that use the Attach API to load an agent into a JVM some > time > after the JVM has started [1]. There is no change to any of the mechanisms > that > load an agent at JVM startup (-javaagent/-agentlib on the command line or > the > Launcher-Agent-Class attribute in the main JAR's manifest). > > This change in default behavior was proposed in 2017 as part of JEP 261 > [2][3]. > At that time the consensus was to switch to this default not in JDK 9 but > in a > later release to give tool maintainers sufficient time to inform their > users. > To allow the dynamic loading of agents, users will need to specify > -XX:+EnableDynamicAgentLoading on the command line. > > I'll post a draft JEP for review shortly. > > -- Ron > > [1]: > https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html > [2]: https://openjdk.org/jeps/261 > [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmesnik at openjdk.org Mon Mar 20 01:37:23 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 20 Mar 2023 01:37:23 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library [v2] In-Reply-To: References: Message-ID: On Sat, 18 Mar 2023 19:14:09 GMT, Mandy Chung wrote: >> `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. > > Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: > > move @library after @modules per the recommended ordering Changes requested by lmesnik (Reviewer). test/jdk/java/lang/ModuleTests/AnnotationsTest.java line 61: > 59: * java.base/jdk.internal.module > 60: * @library /test/lib > 61: * @build jdk.test.lib.util.ModuleInfoWriter You don't need to build library classes explicitly. I think @library /test/lib it enough. ------------- PR: https://git.openjdk.org/jdk/pull/13085 From gcao at openjdk.org Mon Mar 20 01:47:34 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 20 Mar 2023 01:47:34 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v8] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 22:08:23 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 and added load-acquire for resolution check Hi, I have updated the riscv related code by referring to the latest aarch64 related changes, please help me to update it. and i tested hotsport , jdk's tier1 and no new errors were introduced https://github.com/zifeihan/jdk/commit/9c17c5b4953eebdebc6eb84b90a2ff9ca97c78c5 (on this branch: https://github.com/zifeihan/jdk/commits/12778_riscv_port) ------------- PR: https://git.openjdk.org/jdk/pull/12778 From jpai at openjdk.org Mon Mar 20 07:00:24 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 20 Mar 2023 07:00:24 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library [v2] In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 01:34:11 GMT, Leonid Mesnik wrote: >> Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: >> >> move @library after @modules per the recommended ordering > > test/jdk/java/lang/ModuleTests/AnnotationsTest.java line 61: > >> 59: * java.base/jdk.internal.module >> 60: * @library /test/lib >> 61: * @build jdk.test.lib.util.ModuleInfoWriter > > You don't need to build library classes explicitly. I think @library /test/lib it enough. Hello @lmesnik, on the contrary, these build directives are recommended (and based on some of the issues we have encountered, are in fact necessary). The jtreg documentation has this to say https://openjdk.org/jtreg/tag-spec.html: > In general, classes in library directories are not automatically compiled as part of a compilation command explicitly naming the source files containing those classes. A test that relies upon library classes should contain appropriate @build directives to ensure that the classes will be compiled. It is strongly recommended that tests do not rely on the use of implicit compilation by the Java compiler. Such an approach is generally fragile, and may lead to incomplete recompilation when a test or library code has been modified. ------------- PR: https://git.openjdk.org/jdk/pull/13085 From sspitsyn at openjdk.org Mon Mar 20 07:18:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Mar 2023 07:18:24 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-OJbhkKU3EtSS8E31eEd62h3-x5Szpl_Hk0apm1a6aQ=.687c660f-bc13-41cd-bc63-c59ca60300f0@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> <-OJbhkKU3EtSS8E31eEd62h3-x5Szpl_Hk0apm1a6aQ=.687c660f-bc13-41cd-bc63-c59ca60300f0@github.com> Message-ID: On Sat, 18 Mar 2023 11:24:47 GMT, Alan Bateman wrote: > The most important case is when there is no JVMTI env. If I read the changes correctly, the overhead for park/continue changes from one volatile-read (notifyJvmtiEvents) to two plain-writes (JavaThread::_is_in_VTMS_transition). > > If a JVMTI env has been created then there is no benefit for the moment as there is still a call into the runtime to interact with JvmtiVTMSTransitionDisabler. So I think you are saying that is for follow-on PRs. @AlanBateman Yes, your conclusion is correct. ------------- PR: https://git.openjdk.org/jdk/pull/13054 From sspitsyn at openjdk.org Mon Mar 20 07:18:27 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Mar 2023 07:18:27 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for future performance/scalability improvements in JVMTI support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` >> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` >> 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: >> `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` >> `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` >> `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` >> `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` >> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. >> 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. >> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. >> >> Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. >> >> Testing: >> - Ran mach5 tiers 1-6. No regressions were found. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweaks in intrisics implementation Thank you for review, Leonid! ------------- PR: https://git.openjdk.org/jdk/pull/13054 From sspitsyn at openjdk.org Mon Mar 20 07:20:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Mar 2023 07:20:24 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE [v2] In-Reply-To: References: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> Message-ID: On Sat, 18 Mar 2023 20:00:47 GMT, Leonid Mesnik wrote: > Not sure there is any meaningful name for these classes. So I updated it to CLEClass1/2 so it uses CamelCase notation and has test name as prefix to be unique. What does CLE stand for? ------------- PR: https://git.openjdk.org/jdk/pull/13069 From ron.pressler at oracle.com Mon Mar 20 10:10:49 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Mon, 20 Mar 2023 10:10:49 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> Hi. The majority of serviceability tools don?t require dynamically loading an agent, and the majority of applications never load an agent dynamically. True, there are some tools that will be affected, which is why the decision was to introduce the flag in JDK 9 and to announce this change, but change the default in a later version to give tools ample time to prepare their users. The rationale for this change then hasn?t changed, but will be reiterated in a JEP (we just wanted to announce this ahead of the JEP to give tool authors another reminder more than six months ahead of JDK 21). The only change between then and now is that even fewer use cases require dynamically loaded agents, and so the impact is even smaller. It is also true that, when starting an application you don?t know that you *will* need to load an agent, but in most situations you know that you might. E.g. processes that are too critical to bring down even for deep maintenance (although not many of these are written in modern version of Java anyone) or canary services that are under trial. The relatively few sophisticated users who know how to write ad-hoc agents can even opt to enable dynamic agent loading on all their servers; these users are better equipped to can weigh the risks and tradeoffs involved. Finally, some tools that require a dynamically loaded JVM TI agents, such as profilers that profile native code, are so tied to the VM's internals that the best place for them is in the JDK. If anything, the bigger problem is not that profilers are used too much in production, but too little, including less advanced ones that don?t require an agent. There is plenty of time to enhance the JDK?s built-in profiling capabilities ahead of demand. ? Ron On 20 Mar 2023, at 01:21, Andrei Pangin > wrote: Hi all, Serviceability has been one of the biggest Java strengths, but the proposed change is going to have a large negative impact on it. Disallowing dynamic agents by default means it will no longer be possible to attach a profiler to a running app in runtime. JFR cannot close this gap due to lack of capabilities modern Java profilers have (that's a separate topic though). When an issue happens with a live app, it's already too late to add a command line argument. Furthermore, it may not be even feasible to add an agent at startup in containerized applications. Starting profiler on demand from the host OS or from a sidecar is the only viable solution in these cases. Next, it's hard to predict beforehand what tools exactly might be useful for troubleshooting: e.g., one tool may be better for finding memory leaks, a different one for analyzing CPU performance. Adding all possible tools at startup does not seem a reasonable approach, especially when tools may conflict with each other. The most important aspect of dynamic agents is the possibility to make a special tool just in time for solving a particular problem. A typical example is to get a value of some field in a live app without dumping the entire 60 GB heap. Another common use case is hot patching for fixing trivial bugs or for adding debug logs dynamically. The prominent example is when the dynamic agent has proved irreplaceable aid in addressing the notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046. I would be grateful to know more about the reasons why we should give up all the above advantages of dynamic agents in their good and legitimate use cases. Thank you, Andrei ??, 16 ???. 2023??. ? 18:48, Ron Pressler >: Hi. In JDK 21 we intend to disallow the dynamic loading of agents by default. This will affect tools that use the Attach API to load an agent into a JVM some time after the JVM has started [1]. There is no change to any of the mechanisms that load an agent at JVM startup (-javaagent/-agentlib on the command line or the Launcher-Agent-Class attribute in the main JAR's manifest). This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. At that time the consensus was to switch to this default not in JDK 9 but in a later release to give tool maintainers sufficient time to inform their users. To allow the dynamic loading of agents, users will need to specify -XX:+EnableDynamicAgentLoading on the command line. I'll post a draft JEP for review shortly. -- Ron [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html [2]: https://openjdk.org/jeps/261 [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.bachorik at gmail.com Mon Mar 20 10:36:01 2023 From: j.bachorik at gmail.com (Jaroslav Bachorik) Date: Mon, 20 Mar 2023 11:36:01 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> Message-ID: Hi, On Mon, Mar 20, 2023 at 11:11?AM Ron Pressler wrote: > Hi. > > The majority of serviceability tools don?t require dynamically loading an > agent, and the majority of applications never load an agent dynamically. > The majority of the JDK built-in tools, I would say. What about eg. the JMC agent? > > True, there are some tools that will be affected, which is why the > decision was to introduce the flag in JDK 9 and to announce this change, > but change the default in a later version to give tools ample time to > prepare their users. The rationale for this change then hasn?t changed, but > will be reiterated in a JEP (we just wanted to announce this ahead of the > JEP to give tool authors another reminder more than six months ahead of JDK > 21). The only change between then and now is that even fewer use cases > require dynamically loaded agents, and so the impact is even smaller. > As a maintainer of one of such tools I can confidently say that this change will either kill the tool as the ease of use will be gone or the workaround (eg. using JAVA_TOOL_OPTIONS) will completely defeat the purpose of this change. Having to put a flag when starting the JVM to allow dynamic loading of agents sounds a bit nonsensical to me - it would be much easier to directly add the agent to the JVM startup and then implement a lightweight control protocol over socket/shared memory to enabled/disable the agent features dynamically. > > It is also true that, when starting an application you don?t know that you > *will* need to load an agent, but in most situations you know that you > might. E.g. processes that are too critical to bring down even for deep > maintenance (although not many of these are written in modern version of > Java anyone) or canary services that are under trial. The relatively few > sophisticated users who know how to write ad-hoc agents can even opt to > enable dynamic agent loading on all their servers; these users are better > equipped to can weigh the risks and tradeoffs involved. > Wouldn't having this enabled system-wide actually defeat the purpose of having this flag? Considering that the dynamic attach can be performed only on the same host under the same user as the target process there seems to be a very small chance of loading agents accidentally. In the end people would set up their systems to enabled dynamic agent loading via eg. JAVA_TOOL_OPTIONS and we will be in the same place as before, with the additional hurdle of setting everything up. > Finally, some tools that require a dynamically loaded JVM TI agents, such > as profilers that profile native code, are so tied to the VM's internals > that the best place for them is in the JDK. If anything, the bigger problem > is not that profilers are used too much in production, but too little, > including less advanced ones that don?t require an agent. There is plenty > of time to enhance the JDK?s built-in profiling capabilities ahead of > demand. > I think this is an overly optimistic view. It is *much more* difficult to enhance the JDK's built-in profiling capabilities than do the same in an external profiling agent. Overall, I don't seem to understand the anticipated attack vectors this change is supposed to prevent. AFAIK, in order to do the dynamic agent load one needs to have full access to the target process. That means that there are more convenient and straightforward ways to do anything nefarious than loading a JVMTI agent. Am I missing some other usages where the JVMTI agent would actually give access to something which would be otherwise inaccessible considering that the attacher and attachee must be on the same host and under the same user? Cheers, -JB- > > ? Ron > > On 20 Mar 2023, at 01:21, Andrei Pangin wrote: > > Hi all, > > Serviceability has been one of the biggest Java strengths, but the > proposed change is going to have a large negative impact on it. > > Disallowing dynamic agents by default means it will no longer be possible > to attach a profiler to a running app in runtime. JFR cannot close this gap > due to lack of capabilities modern Java profilers have (that's a separate > topic though). > > When an issue happens with a live app, it's already too late to add a > command line argument. Furthermore, it may not be even feasible to add an > agent at startup in containerized applications. Starting profiler on demand > from the host OS or from a sidecar is the only viable solution in these > cases. > > Next, it's hard to predict beforehand what tools exactly might be useful > for troubleshooting: e.g., one tool may be better for finding memory leaks, > a different one for analyzing CPU performance. Adding all possible tools at > startup does not seem a reasonable approach, especially when tools may > conflict with each other. > > The most important aspect of dynamic agents is the possibility to make a > special tool just in time for solving a particular problem. A typical > example is to get a value of some field in a live app without dumping the > entire 60 GB heap. Another common use case is hot patching for fixing > trivial bugs or for adding debug logs dynamically. The prominent example is > when the dynamic agent has proved irreplaceable aid in addressing the > notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046. > > I would be grateful to know more about the reasons why we should give up > all the above advantages of dynamic agents in their good and legitimate use > cases. > > Thank you, > Andrei > > ??, 16 ???. 2023??. ? 18:48, Ron Pressler : > >> Hi. >> >> In JDK 21 we intend to disallow the dynamic loading of agents by default. >> This >> will affect tools that use the Attach API to load an agent into a JVM >> some time >> after the JVM has started [1]. There is no change to any of the >> mechanisms that >> load an agent at JVM startup (-javaagent/-agentlib on the command line or >> the >> Launcher-Agent-Class attribute in the main JAR's manifest). >> >> This change in default behavior was proposed in 2017 as part of JEP 261 >> [2][3]. >> At that time the consensus was to switch to this default not in JDK 9 but >> in a >> later release to give tool maintainers sufficient time to inform their >> users. >> To allow the dynamic loading of agents, users will need to specify >> -XX:+EnableDynamicAgentLoading on the command line. >> >> I'll post a draft JEP for review shortly. >> >> -- Ron >> >> [1]: >> https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html >> [2]: https://openjdk.org/jeps/261 >> [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecki at zusammenkunft.net Mon Mar 20 12:03:20 2023 From: ecki at zusammenkunft.net (Bernd) Date: Mon, 20 Mar 2023 13:03:20 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com>, Message-ID: <3C96D2B9-1A2D-644C-B039-837EE7D22291@hxcore.ol> An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Mon Mar 20 12:16:59 2023 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 20 Mar 2023 13:16:59 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> Message-ID: Hi Ron, I'm still missing convincing technical arguments for disallowing dynamic loading of agents. If the argument is security then I can only agree with previous answers in that an attacker needs local access with the same credentials like the attacked JVM. But once he has that, all bets are off anyway. If you plan for features/enhancements/optimizations that rely on not being able to dynamically load an agent (which I haven't heard off yet), I don't understand this change either. Because as long as a switch for enabling dynamic loading exists (and I haven' heard that you want to completely forbid it) the dynamic loading use case has to be supported anyway. Dynamic agent loading is one of the features which sets the OpenJDK apart from other languages, managed runtimes and even closely related platforms like for example GraalVM Native Image which don't support such a feature. The mere existence of tools which rely on it and which are in widespread productive use, demonstrates its usefulness. And it is always good to know you have this possibility in your toolbox for the worst case (e.g. our log4j-hotpatcher [1]). I also can't by your argument that "the relatively few sophisticated users who know how to write ad-hoc agents can even opt to enable dynamic agent loading on all their servers". It is *exactly* not the few sophisticated authors of dynamic agents who would need to enable them but instead the millions of ingenious end-users and administrators who bag for help once they run into trouble. The other way round makes much more sense to me - the few sophisticated users who know for sure that they will never need the help of dynamic agents are free to disable them at startup. Given the current arguments, for me the usefulness of dynamic agents outweigh their drawbacks by far. Of course every OpenJDK distributor is free to change the default settings of command line options at his sole discretion, but I don't currently see a compelling reason for doing this by default for the whole OpenJDK community. If you have future plans which rely on disabling/forbidding dynamic agents please let us know. Best regards, Volker [1] https://aws.amazon.com/blogs/opensource/hotpatch-for-apache-log4j/ On Mon, Mar 20, 2023 at 11:37?AM Jaroslav Bachorik wrote: > > Hi, > > On Mon, Mar 20, 2023 at 11:11?AM Ron Pressler wrote: >> >> Hi. >> >> The majority of serviceability tools don?t require dynamically loading an agent, and the majority of applications never load an agent dynamically. > > > The majority of the JDK built-in tools, I would say. What about eg. the JMC agent? > >> >> >> True, there are some tools that will be affected, which is why the decision was to introduce the flag in JDK 9 and to announce this change, but change the default in a later version to give tools ample time to prepare their users. The rationale for this change then hasn?t changed, but will be reiterated in a JEP (we just wanted to announce this ahead of the JEP to give tool authors another reminder more than six months ahead of JDK 21). The only change between then and now is that even fewer use cases require dynamically loaded agents, and so the impact is even smaller. > > > As a maintainer of one of such tools I can confidently say that this change will either kill the tool as the ease of use will be gone or the workaround (eg. using JAVA_TOOL_OPTIONS) will completely defeat the purpose of this change. Having to put a flag when starting the JVM to allow dynamic loading of agents sounds a bit nonsensical to me - it would be much easier to directly add the agent to the JVM startup and then implement a lightweight control protocol over socket/shared memory to enabled/disable the agent features dynamically. > >> >> >> It is also true that, when starting an application you don?t know that you *will* need to load an agent, but in most situations you know that you might. E.g. processes that are too critical to bring down even for deep maintenance (although not many of these are written in modern version of Java anyone) or canary services that are under trial. The relatively few sophisticated users who know how to write ad-hoc agents can even opt to enable dynamic agent loading on all their servers; these users are better equipped to can weigh the risks and tradeoffs involved. > > > Wouldn't having this enabled system-wide actually defeat the purpose of having this flag? Considering that the dynamic attach can be performed only on the same host under the same user as the target process there seems to be a very small chance of loading agents accidentally. In the end people would set up their systems to enabled dynamic agent loading via eg. JAVA_TOOL_OPTIONS and we will be in the same place as before, with the additional hurdle of setting everything up. > >> >> Finally, some tools that require a dynamically loaded JVM TI agents, such as profilers that profile native code, are so tied to the VM's internals that the best place for them is in the JDK. If anything, the bigger problem is not that profilers are used too much in production, but too little, including less advanced ones that don?t require an agent. There is plenty of time to enhance the JDK?s built-in profiling capabilities ahead of demand. > > > I think this is an overly optimistic view. It is *much more* difficult to enhance the JDK's built-in profiling capabilities than do the same in an external profiling agent. > > > Overall, I don't seem to understand the anticipated attack vectors this change is supposed to prevent. AFAIK, in order to do the dynamic agent load one needs to have full access to the target process. That means that there are more convenient and straightforward ways to do anything nefarious than loading a JVMTI agent. Am I missing some other usages where the JVMTI agent would actually give access to something which would be otherwise inaccessible considering that the attacher and attachee must be on the same host and under the same user? > > Cheers, > > -JB- > >> >> >> ? Ron >> >> On 20 Mar 2023, at 01:21, Andrei Pangin wrote: >> >> Hi all, >> >> Serviceability has been one of the biggest Java strengths, but the proposed change is going to have a large negative impact on it. >> >> Disallowing dynamic agents by default means it will no longer be possible to attach a profiler to a running app in runtime. JFR cannot close this gap due to lack of capabilities modern Java profilers have (that's a separate topic though). >> >> When an issue happens with a live app, it's already too late to add a command line argument. Furthermore, it may not be even feasible to add an agent at startup in containerized applications. Starting profiler on demand from the host OS or from a sidecar is the only viable solution in these cases. >> >> Next, it's hard to predict beforehand what tools exactly might be useful for troubleshooting: e.g., one tool may be better for finding memory leaks, a different one for analyzing CPU performance. Adding all possible tools at startup does not seem a reasonable approach, especially when tools may conflict with each other. >> >> The most important aspect of dynamic agents is the possibility to make a special tool just in time for solving a particular problem. A typical example is to get a value of some field in a live app without dumping the entire 60 GB heap. Another common use case is hot patching for fixing trivial bugs or for adding debug logs dynamically. The prominent example is when the dynamic agent has proved irreplaceable aid in addressing the notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046. >> >> I would be grateful to know more about the reasons why we should give up all the above advantages of dynamic agents in their good and legitimate use cases. >> >> Thank you, >> Andrei >> >> ??, 16 ???. 2023??. ? 18:48, Ron Pressler : >>> >>> Hi. >>> >>> In JDK 21 we intend to disallow the dynamic loading of agents by default. This >>> will affect tools that use the Attach API to load an agent into a JVM some time >>> after the JVM has started [1]. There is no change to any of the mechanisms that >>> load an agent at JVM startup (-javaagent/-agentlib on the command line or the >>> Launcher-Agent-Class attribute in the main JAR's manifest). >>> >>> This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. >>> At that time the consensus was to switch to this default not in JDK 9 but in a >>> later release to give tool maintainers sufficient time to inform their users. >>> To allow the dynamic loading of agents, users will need to specify >>> -XX:+EnableDynamicAgentLoading on the command line. >>> >>> I'll post a draft JEP for review shortly. >>> >>> -- Ron >>> >>> [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html >>> [2]: https://openjdk.org/jeps/261 >>> [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html >> >> From matsaave at openjdk.org Mon Mar 20 14:29:35 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 20 Mar 2023 14:29:35 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fix riscv interpreter mistake and acquire semantics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/6600e6dc..8607f62a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=07-08 Stats: 18 lines in 4 files changed: 4 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From kirk.pepperdine at gmail.com Mon Mar 20 17:02:19 2023 From: kirk.pepperdine at gmail.com (Kirk Pepperdine) Date: Mon, 20 Mar 2023 10:02:19 -0700 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> Message-ID: Hi Ron, > On Mar 20, 2023, at 3:10 AM, Ron Pressler wrote: > > Hi. > > The majority of serviceability tools don?t require dynamically loading an agent, and the majority of applications never load an agent dynamically. While I wouldn?t be surprised that the majority don?t load agents dynamically, I wouldn?t want to diminishes the importance of this capability for those that do make use of it. And I believe the number that do dynamically load might surprise you. But then, my data on this is likely highly biased. Do you have better data to support this view point? > > True, there are some tools that will be affected, which is why the decision was to introduce the flag in JDK 9 and to announce this change, but change the default in a later version to give tools ample time to prepare their users. The rationale for this change then hasn?t changed, but will be reiterated in a JEP (we just wanted to announce this ahead of the JEP to give tool authors another reminder more than six months ahead of JDK 21). The only change between then and now is that even fewer use cases require dynamically loaded agents, and so the impact is even smaller. Again, I?m not sure I see the data to support this. But then again, my view point remains highly biased. And I see an assumption that tools will be able to easily adapt to this change. I?m not sure that is entirely true. At least not in a way that in effect returns dynamic attach capabilities with a directly loaded proxy. > > It is also true that, when starting an application you don?t know that you *will* need to load an agent, but in most situations you know that you might. E.g. processes that are too critical to bring down even for deep maintenance (although not many of these are written in modern version of Java anyone) or canary services that are under trial. The relatively few sophisticated users who know how to write ad-hoc agents can even opt to enable dynamic agent loading on all their servers; these users are better equipped to can weigh the risks and tradeoffs involved. Again, I?m not sure I?d equate numbers to importance. As an analogy, there are very few people that know how to build cars and maybe more that know how to fix them but, there are certainly many many more than know how to use them. > > Finally, some tools that require a dynamically loaded JVM TI agents, such as profilers that profile native code, are so tied to the VM's internals that the best place for them is in the JDK. If anything, the bigger problem is not that profilers are used too much in production, but too little, including less advanced ones that don?t require an agent. There is plenty of time to enhance the JDK?s built-in profiling capabilities ahead of demand. At odds is that all profilers come with biases. I?ve always stressed that the first thing one needs to do with any profiler is discover it?s biases and then determine how that bias affects the results, how one should account for the bias, or even should another tool be used. To this point, JFR, a built in profiler, has significant biases. For example, allocation profiling, quite often, completely misses allocation hotspots for small objects that are not scalar replaced in all but trivial examples. But let's not pick on JFR because it is the tool of choice for many other things and other allocation profilers do have other biases such as altering JIT behavior that may cause EA to fail thus preventing otherwise eligible allocation hotspots from being scalar replaced. While one tool is blind, the other generates false positives. Knowing this, I can combine the JIT logs with profiler results to help offset the effects of the bias for the later profile.. however, I can?t do anything for the blind spot. Finally, if there is anything lesson to be learned from the migrations from 8 to 9 is that tooling is a huge anchor preventing people from upgrading. That JDK 8 is still in as widespread use as it is, is in no small part due to the extensive change in the tooling chain. In fact, a number of very useful tools simply didn?t survive leaving us with less desirable alternatives. The other historical data point that maybe of comparison is the introduction of generics into the language. While this slowed the adoption of JDK 5 (from 1.4.2), it had no where near the impact that the degradation of the observability/diagnostic tool chain had on the migration rates from 7 to 8 and then this huge impact of 9. In my opinion, we?ve learned enough from this migration to understand that we may need to re-evaluate decisions that were made prior to these learning. Kind regards, Kirk > > ? Ron > >> On 20 Mar 2023, at 01:21, Andrei Pangin > wrote: >> >> Hi all, >> >> Serviceability has been one of the biggest Java strengths, but the proposed change is going to have a large negative impact on it. >> >> Disallowing dynamic agents by default means it will no longer be possible to attach a profiler to a running app in runtime. JFR cannot close this gap due to lack of capabilities modern Java profilers have (that's a separate topic though). >> >> When an issue happens with a live app, it's already too late to add a command line argument. Furthermore, it may not be even feasible to add an agent at startup in containerized applications. Starting profiler on demand from the host OS or from a sidecar is the only viable solution in these cases. >> >> Next, it's hard to predict beforehand what tools exactly might be useful for troubleshooting: e.g., one tool may be better for finding memory leaks, a different one for analyzing CPU performance. Adding all possible tools at startup does not seem a reasonable approach, especially when tools may conflict with each other. >> >> The most important aspect of dynamic agents is the possibility to make a special tool just in time for solving a particular problem. A typical example is to get a value of some field in a live app without dumping the entire 60 GB heap. Another common use case is hot patching for fixing trivial bugs or for adding debug logs dynamically. The prominent example is when the dynamic agent has proved irreplaceable aid in addressing the notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046. >> >> I would be grateful to know more about the reasons why we should give up all the above advantages of dynamic agents in their good and legitimate use cases. >> >> Thank you, >> Andrei >> >> ??, 16 ???. 2023??. ? 18:48, Ron Pressler >: >> Hi. >> >> In JDK 21 we intend to disallow the dynamic loading of agents by default. This >> will affect tools that use the Attach API to load an agent into a JVM some time >> after the JVM has started [1]. There is no change to any of the mechanisms that >> load an agent at JVM startup (-javaagent/-agentlib on the command line or the >> Launcher-Agent-Class attribute in the main JAR's manifest). >> >> This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. >> At that time the consensus was to switch to this default not in JDK 9 but in a >> later release to give tool maintainers sufficient time to inform their users. >> To allow the dynamic loading of agents, users will need to specify >> -XX:+EnableDynamicAgentLoading on the command line. >> >> I'll post a draft JEP for review shortly. >> >> -- Ron >> >> [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html >> [2]: https://openjdk.org/jeps/261 >> [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From mchung at openjdk.org Mon Mar 20 17:12:56 2023 From: mchung at openjdk.org (Mandy Chung) Date: Mon, 20 Mar 2023 17:12:56 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library [v2] In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 06:57:32 GMT, Jaikiran Pai wrote: >> test/jdk/java/lang/ModuleTests/AnnotationsTest.java line 61: >> >>> 59: * java.base/jdk.internal.module >>> 60: * @library /test/lib >>> 61: * @build jdk.test.lib.util.ModuleInfoWriter >> >> You don't need to build library classes explicitly. I think @library /test/lib it enough. > > Hello @lmesnik, on the contrary, these build directives are recommended (and based on some of the issues we have encountered, are in fact necessary). The jtreg documentation has this to say https://openjdk.org/jtreg/tag-spec.html: > >> In general, classes in library directories are not automatically compiled as part of a compilation command explicitly naming the source files containing those classes. A test that relies upon library classes should contain appropriate @build directives to ensure that the classes will be compiled. It is strongly recommended that tests do not rely on the use of implicit compilation by the Java compiler. Such an approach is generally fragile, and may lead to incomplete recompilation when a test or library code has been modified. Explicit compilation is exactly the reason of adding `@build` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13085#discussion_r1142447529 From mchung at openjdk.org Mon Mar 20 17:33:33 2023 From: mchung at openjdk.org (Mandy Chung) Date: Mon, 20 Mar 2023 17:33:33 GMT Subject: Integrated: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 22:01:46 GMT, Mandy Chung wrote: > `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. This pull request has now been integrated. Changeset: 622f2394 Author: Mandy Chung URL: https://git.openjdk.org/jdk/commit/622f239448c2a96a74202621ee84c181d79fbde4 Stats: 154 lines in 17 files changed: 91 ins; 19 del; 44 mod 8304163: Move jdk.internal.module.ModuleInfoWriter to the test library Reviewed-by: jpai, alanb ------------- PR: https://git.openjdk.org/jdk/pull/13085 From ron.pressler at oracle.com Mon Mar 20 17:53:25 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Mon, 20 Mar 2023 17:53:25 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> Message-ID: Hi Kirk. While the JEP will reiterate the relevant considerations (and no one denies that dynamically loaded agents are not useful) that led to this change being announced some years ago, the purpose of my email was to announce it will finally take effect in JDK 21. All the discussions at time, over all the relevant areas, informed the design at the core of the platform and its evolution in the past five years, namely that the application must grant explicit consent to anything affecting integrity (i.e. guarantees you can trust). Unless something has changed dramatically since then, or some new information has come to light, reopening the discussions around past decisions that have shaped the platform?s current design are unlikely to yield different results. Many of those discussions are available on jigsaw-dev. It is because some people may be affected that JEP 261 postponed the changing of that default, so that everyone would have time to prepare and prepare their users. We?ve now given an extra six-month advance notice to give those who haven?t finished preparing their users the time to do so. This is an opportunity to remind everyone that other capabilities that similarly affect platform and application integrity ? such as JNI and Unsafe ? will also require the application?s consent on the command line ? not in JDK 21, but soon thereafter. ? Ron On 20 Mar 2023, at 17:02, Kirk Pepperdine > wrote: Hi Ron, On Mar 20, 2023, at 3:10 AM, Ron Pressler > wrote: Hi. The majority of serviceability tools don?t require dynamically loading an agent, and the majority of applications never load an agent dynamically. While I wouldn?t be surprised that the majority don?t load agents dynamically, I wouldn?t want to diminishes the importance of this capability for those that do make use of it. And I believe the number that do dynamically load might surprise you. But then, my data on this is likely highly biased. Do you have better data to support this view point? True, there are some tools that will be affected, which is why the decision was to introduce the flag in JDK 9 and to announce this change, but change the default in a later version to give tools ample time to prepare their users. The rationale for this change then hasn?t changed, but will be reiterated in a JEP (we just wanted to announce this ahead of the JEP to give tool authors another reminder more than six months ahead of JDK 21). The only change between then and now is that even fewer use cases require dynamically loaded agents, and so the impact is even smaller. Again, I?m not sure I see the data to support this. But then again, my view point remains highly biased. And I see an assumption that tools will be able to easily adapt to this change. I?m not sure that is entirely true. At least not in a way that in effect returns dynamic attach capabilities with a directly loaded proxy. It is also true that, when starting an application you don?t know that you *will* need to load an agent, but in most situations you know that you might. E.g. processes that are too critical to bring down even for deep maintenance (although not many of these are written in modern version of Java anyone) or canary services that are under trial. The relatively few sophisticated users who know how to write ad-hoc agents can even opt to enable dynamic agent loading on all their servers; these users are better equipped to can weigh the risks and tradeoffs involved. Again, I?m not sure I?d equate numbers to importance. As an analogy, there are very few people that know how to build cars and maybe more that know how to fix them but, there are certainly many many more than know how to use them. Finally, some tools that require a dynamically loaded JVM TI agents, such as profilers that profile native code, are so tied to the VM's internals that the best place for them is in the JDK. If anything, the bigger problem is not that profilers are used too much in production, but too little, including less advanced ones that don?t require an agent. There is plenty of time to enhance the JDK?s built-in profiling capabilities ahead of demand. At odds is that all profilers come with biases. I?ve always stressed that the first thing one needs to do with any profiler is discover it?s biases and then determine how that bias affects the results, how one should account for the bias, or even should another tool be used. To this point, JFR, a built in profiler, has significant biases. For example, allocation profiling, quite often, completely misses allocation hotspots for small objects that are not scalar replaced in all but trivial examples. But let's not pick on JFR because it is the tool of choice for many other things and other allocation profilers do have other biases such as altering JIT behavior that may cause EA to fail thus preventing otherwise eligible allocation hotspots from being scalar replaced. While one tool is blind, the other generates false positives. Knowing this, I can combine the JIT logs with profiler results to help offset the effects of the bias for the later profile.. however, I can?t do anything for the blind spot. Finally, if there is anything lesson to be learned from the migrations from 8 to 9 is that tooling is a huge anchor preventing people from upgrading. That JDK 8 is still in as widespread use as it is, is in no small part due to the extensive change in the tooling chain. In fact, a number of very useful tools simply didn?t survive leaving us with less desirable alternatives. The other historical data point that maybe of comparison is the introduction of generics into the language. While this slowed the adoption of JDK 5 (from 1.4.2), it had no where near the impact that the degradation of the observability/diagnostic tool chain had on the migration rates from 7 to 8 and then this huge impact of 9. In my opinion, we?ve learned enough from this migration to understand that we may need to re-evaluate decisions that were made prior to these learning. Kind regards, Kirk ? Ron On 20 Mar 2023, at 01:21, Andrei Pangin > wrote: Hi all, Serviceability has been one of the biggest Java strengths, but the proposed change is going to have a large negative impact on it. Disallowing dynamic agents by default means it will no longer be possible to attach a profiler to a running app in runtime. JFR cannot close this gap due to lack of capabilities modern Java profilers have (that's a separate topic though). When an issue happens with a live app, it's already too late to add a command line argument. Furthermore, it may not be even feasible to add an agent at startup in containerized applications. Starting profiler on demand from the host OS or from a sidecar is the only viable solution in these cases. Next, it's hard to predict beforehand what tools exactly might be useful for troubleshooting: e.g., one tool may be better for finding memory leaks, a different one for analyzing CPU performance. Adding all possible tools at startup does not seem a reasonable approach, especially when tools may conflict with each other. The most important aspect of dynamic agents is the possibility to make a special tool just in time for solving a particular problem. A typical example is to get a value of some field in a live app without dumping the entire 60 GB heap. Another common use case is hot patching for fixing trivial bugs or for adding debug logs dynamically. The prominent example is when the dynamic agent has proved irreplaceable aid in addressing the notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046. I would be grateful to know more about the reasons why we should give up all the above advantages of dynamic agents in their good and legitimate use cases. Thank you, Andrei ??, 16 ???. 2023??. ? 18:48, Ron Pressler >: Hi. In JDK 21 we intend to disallow the dynamic loading of agents by default. This will affect tools that use the Attach API to load an agent into a JVM some time after the JVM has started [1]. There is no change to any of the mechanisms that load an agent at JVM startup (-javaagent/-agentlib on the command line or the Launcher-Agent-Class attribute in the main JAR's manifest). This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. At that time the consensus was to switch to this default not in JDK 9 but in a later release to give tool maintainers sufficient time to inform their users. To allow the dynamic loading of agents, users will need to specify -XX:+EnableDynamicAgentLoading on the command line. I'll post a draft JEP for review shortly. -- Ron [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html [2]: https://openjdk.org/jeps/261 [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcubed at openjdk.org Mon Mar 20 17:55:01 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 20 Mar 2023 17:55:01 GMT Subject: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2] In-Reply-To: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> References: <48-bBJEONRY6wsWff25Gv6X2ETeNnTS0uUmZ7JdjVVc=.446d3f9f-62aa-4676-9a54-f9740a95ac9e@github.com> Message-ID: On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve NoClassDefFoundError failures; It's known JTReg issue that "@build" actions for part of used shared classes may cause intermittent NoClassDefFoundError in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > feedback Sigh... And again we have the situation where some folks are adding `@build` directives and other folks are removing `@build` directives. Another recent PR added library build directives: https://github.com/openjdk/jdk/pull/13085 based on quoted guidance from the JTREG documentation. This mess is related to: [CODETOOLS-7902847](https://bugs.openjdk.org/browse/CODETOOLS-7902847) Class directory of a test case should not be used to compile a library and NoClassDefFoundErrors show up when doing parallel execution of tests where more than one test uses the "offending" library. We really, really need @jonathan-gibbons to chime in on review threads like these. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13030#issuecomment-1476684726 From dcubed at openjdk.org Mon Mar 20 17:55:59 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 20 Mar 2023 17:55:59 GMT Subject: RFR: JDK-8304163: Move jdk.internal.module.ModuleInfoWriter to the test library [v2] In-Reply-To: References: Message-ID: On Sat, 18 Mar 2023 19:14:09 GMT, Mandy Chung wrote: >> `ModuleInfoWriter` is not used by the runtime. Move it to the test library as `jdk.test.lib.util.ModuleInfoWriter`. The tests are updated to use the test library instead. `ModuleInfoWriter` depends on `jdk.internal.module` types and the Classfile API. Hence `@modules java.base/jdk.internal.classfile` and other classfile subpackages are added. > > Mandy Chung has updated the pull request incrementally with one additional commit since the last revision: > > move @library after @modules per the recommended ordering Sigh... And again we have the situation where some folks are adding `@build` directives and other folks are removing `@build` directives. Another recent PR removed library build directives: https://github.com/openjdk/jdk/pull/13030 and that made the related tests stop failing with NoClassDefFoundErrors. This mess is related to: [CODETOOLS-7902847](https://bugs.openjdk.org/browse/CODETOOLS-7902847) Class directory of a test case should not be used to compile a library and these problems show up when doing parallel execution of tests where more than one test uses the "offending" library. We really, really need @jonathan-gibbons to chime in on review threads like these. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13085#issuecomment-1476684779 From ron.pressler at oracle.com Mon Mar 20 17:58:33 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Mon, 20 Mar 2023 17:58:33 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> Message-ID: > On 20 Mar 2023, at 17:53, Ron Pressler wrote: > > While the JEP will reiterate the relevant considerations (and no one denies that dynamically loaded agents are not useful) Sorry, no one denies that dynamically loaded agents *are* useful :) From lmesnik at openjdk.org Mon Mar 20 18:30:21 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 20 Mar 2023 18:30:21 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE [v2] In-Reply-To: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> References: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> Message-ID: On Sat, 18 Mar 2023 19:54:51 GMT, Leonid Mesnik wrote: >> The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. >> >> The simplest fix to avoid this is just to rename class. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > updated name It is a just prefix for this test. It is named CLETest, where CLE is co-located event. I'm fine to change it to any unique enough name. However I didn't find any. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13069#issuecomment-1476734874 From cjplummer at openjdk.org Mon Mar 20 18:54:33 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 20 Mar 2023 18:54:33 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE [v2] In-Reply-To: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> References: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> Message-ID: On Sat, 18 Mar 2023 19:54:51 GMT, Leonid Mesnik wrote: >> The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. >> >> The simplest fix to avoid this is just to rename class. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > updated name Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13069#pullrequestreview-1349211685 From sspitsyn at openjdk.org Mon Mar 20 19:58:51 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Mar 2023 19:58:51 GMT Subject: Integrated: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics In-Reply-To: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> Message-ID: <2QSQ5C7cdI-KgoFNa3aLqdYQQbLMY7P3qrKvVMsN86I=.503493f9-d9d0-4bff-b8a0-f95f82d412bb@github.com> On Thu, 16 Mar 2023 05:03:51 GMT, Serguei Spitsyn wrote: > This is needed for future performance/scalability improvements in JVMTI support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. This pull request has now been integrated. Changeset: bc0ed730 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/bc0ed730f2c9dad55d0046b4fe8c9cd623b6dbf8 Stats: 449 lines in 20 files changed: 276 ins; 125 del; 48 mod 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics Reviewed-by: vlivanov, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/13054 From duke at openjdk.org Mon Mar 20 20:09:30 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Mon, 20 Mar 2023 20:09:30 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java Message-ID: Please review this PR which replaces the use of outdated JVM flags for setting up debugging in the test value004.java This is part of an ongoing effort to remove use of the outdated flag '-Djava.compiler" such that the option itself can eventually be removed. ------------- Commit messages: - Update copyright year - Modernize JVM args for setting up debugging in the test value004.java Changes: https://git.openjdk.org/jdk/pull/13107/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13107&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304543 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13107.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13107/head:pull/13107 PR: https://git.openjdk.org/jdk/pull/13107 From cjplummer at openjdk.org Mon Mar 20 20:37:15 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 20 Mar 2023 20:37:15 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 19:47:10 GMT, Eirik Bjorsnos wrote: > Please review this PR which replaces the use of outdated JVM flags for setting up debugging in the test value004.java > > This is part of an ongoing effort to remove use of the outdated flag '-Djava.compiler" such that the option itself can eventually be removed. This is the first I've seen of this java.compiler setting w.r.t. debugging. Is this because at one point debugging required that the JIT be disabled, but no longer does? I found the following in com/sun/tools/jdi/SunCommandLineLauncher.java: and wonder if it is also dated and can be removed: if ((options.indexOf("-Djava.compiler=") != -1) && (options.toLowerCase().indexOf("-djava.compiler=none") == -1)) { throw new IllegalConnectorArgumentsException("Cannot debug with a JIT compiler", ARG_OPTIONS); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/13107#issuecomment-1476893802 From duke at openjdk.org Mon Mar 20 20:42:42 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Mon, 20 Mar 2023 20:42:42 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 20:34:19 GMT, Chris Plummer wrote: > This is the first I've seen of this java.compiler setting w.r.t. debugging. Is this because at one point debugging required that the JIT be disabled, but no longer does? I found the following in com/sun/tools/jdi/SunCommandLineLauncher.java: and wonder if it is also dated and can be removed: I do remember noticing this in my first pass throgh the code but think I miscategorized it as part of the implementation, not simply a use site, which it seems to be. I'll file a separate PR for SunCommandLineLauncher. Thanks for catching this, Chris! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13107#issuecomment-1476901357 From duke at openjdk.org Mon Mar 20 21:04:31 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Mon, 20 Mar 2023 21:04:31 GMT Subject: RFR: 8304547: Remove checking of -Djava.compiler in src/jdk.jdi/share/classes/com/sun/tools/jdi/SunCommandLineLauncher.java Message-ID: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> Please review this PR which removes the following outdated guard/check from SunCommandLineLauncher: if ((options.indexOf("-Djava.compiler=") != -1) && (options.toLowerCase().indexOf("-djava.compiler=none") == -1)) { throw new IllegalConnectorArgumentsException("Cannot debug with a JIT compiler", ARG_OPTIONS); } Efforts are underway to remove the `java.compiler` system property entirely, besides that this test no longer makes sense since debugging with a JIT has been supported for a while. ------------- Commit messages: - Remove check of java.compiler since the property is being removed and the thing tested is no longer valid (debugging works fine with JITs) Changes: https://git.openjdk.org/jdk/pull/13109/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13109&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304547 Stats: 7 lines in 1 file changed: 0 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13109.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13109/head:pull/13109 PR: https://git.openjdk.org/jdk/pull/13109 From duke at openjdk.org Mon Mar 20 21:06:39 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Mon, 20 Mar 2023 21:06:39 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 20:40:09 GMT, Eirik Bjorsnos wrote: > I'll file a separate PR for SunCommandLineLauncher. Thanks for catching this, Chris! Filed #13109. Since these are my first two PRs in the servicibility area, I would appreciate an extra eye on the JBS issues to see that I got the fields right. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13107#issuecomment-1476931871 From cjplummer at openjdk.org Mon Mar 20 21:32:41 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 20 Mar 2023 21:32:41 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 21:04:04 GMT, Eirik Bjorsnos wrote: > Filed #13109. Since these are my first two PRs in the servicibility area, I would appreciate an extra eye on the JBS issues to see that I got the fields right. core-svc/debugger is how all JDI, debug agent, and JDWP spec bugs should be categorized, including tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13107#issuecomment-1476960317 From sspitsyn at openjdk.org Mon Mar 20 22:13:39 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Mar 2023 22:13:39 GMT Subject: RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE [v2] In-Reply-To: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> References: <_7Lk8ehTOJGUuRX7wd6nHPNcu9-IBfN-Ie10jG7Ll_U=.0a964a49-2fb6-46c7-ac19-ec02c8fce820@github.com> Message-ID: <7uvDOsLFiwvzNXajMnxfx2ogRrImXWjzm1igXiBJjGI=.a83a6c97-a240-46c5-89b1-a3510e34c8a7@github.com> On Sat, 18 Mar 2023 19:54:51 GMT, Leonid Mesnik wrote: >> The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. >> >> The simplest fix to avoid this is just to rename class. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > updated name Marked as reviewed by sspitsyn (Reviewer). Okay, thanks. ------------- PR Review: https://git.openjdk.org/jdk/pull/13069#pullrequestreview-1349470664 PR Comment: https://git.openjdk.org/jdk/pull/13069#issuecomment-1477005781 From sspitsyn at openjdk.org Tue Mar 21 00:56:51 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 21 Mar 2023 00:56:51 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 10:43:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup src/hotspot/share/prims/jvmtiEnvBase.hpp line 166: > 164: > 165: const void* get_env_local_storage() { return _env_local_storage; } > 166: Why was this change/move necessary? Do I miss anything? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1142804605 From dholmes at openjdk.org Tue Mar 21 02:31:40 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 21 Mar 2023 02:31:40 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 19:47:10 GMT, Eirik Bjorsnos wrote: > Please review this PR which replaces the use of outdated JVM flags for setting up debugging in the test value004.java > > This is part of an ongoing effort to remove use of the outdated flag '-Djava.compiler" such that the option itself can eventually be removed. Looks good. Xdebug is supposed to be a no-op these days, except when used in conjunction with `-Djava.compiler=NONE`. In that case the use of Xdebug actually causes `-Djava.compiler=NONE` to be ignored. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13107#pullrequestreview-1349622834 From dholmes at openjdk.org Tue Mar 21 02:35:49 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 21 Mar 2023 02:35:49 GMT Subject: RFR: 8304547: Remove checking of -Djava.compiler in src/jdk.jdi/share/classes/com/sun/tools/jdi/SunCommandLineLauncher.java In-Reply-To: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> References: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> Message-ID: On Mon, 20 Mar 2023 20:53:41 GMT, Eirik Bjorsnos wrote: > Please review this PR which removes the following outdated guard/check from SunCommandLineLauncher: > > > if ((options.indexOf("-Djava.compiler=") != -1) && > (options.toLowerCase().indexOf("-djava.compiler=none") == -1)) { > throw new IllegalConnectorArgumentsException("Cannot debug with a JIT compiler", > ARG_OPTIONS); > } > > > > Efforts are underway to remove the `java.compiler` system property entirely, besides that this test no longer makes sense since debugging with a JIT has been supported for a while. Looks good. This was actually found years ago but the issue closed: [JDK-6374661](https://bugs.openjdk.org/browse/JDK-6374661) ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13109#pullrequestreview-1349624636 From cjplummer at openjdk.org Tue Mar 21 02:51:41 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 Mar 2023 02:51:41 GMT Subject: RFR: 8304547: Remove checking of -Djava.compiler in src/jdk.jdi/share/classes/com/sun/tools/jdi/SunCommandLineLauncher.java In-Reply-To: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> References: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> Message-ID: On Mon, 20 Mar 2023 20:53:41 GMT, Eirik Bjorsnos wrote: > Please review this PR which removes the following outdated guard/check from SunCommandLineLauncher: > > > if ((options.indexOf("-Djava.compiler=") != -1) && > (options.toLowerCase().indexOf("-djava.compiler=none") == -1)) { > throw new IllegalConnectorArgumentsException("Cannot debug with a JIT compiler", > ARG_OPTIONS); > } > > > > Efforts are underway to remove the `java.compiler` system property entirely, besides that this test no longer makes sense since debugging with a JIT has been supported for a while. Marked as reviewed by cjplummer (Reviewer). Please make sure you run the jdk/com/sun/jdi tests and the jdb, jdwp, and jdi test under hotspot/jtreg/vmTestbase/nsk ------------- PR Review: https://git.openjdk.org/jdk/pull/13109#pullrequestreview-1349632141 PR Comment: https://git.openjdk.org/jdk/pull/13109#issuecomment-1477208049 From cjplummer at openjdk.org Tue Mar 21 02:52:42 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 Mar 2023 02:52:42 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 19:47:10 GMT, Eirik Bjorsnos wrote: > Please review this PR which replaces the use of outdated JVM flags for setting up debugging in the test value004.java > > This is part of an ongoing effort to remove use of the outdated flag '-Djava.compiler" such that the option itself can eventually be removed. Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13107#pullrequestreview-1349632733 From alanb at openjdk.org Tue Mar 21 07:29:44 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 21 Mar 2023 07:29:44 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: <56U4EpcV8_VSi6Oj6q7-kHoFm6hJ9v498ncKwLvElaM=.501dbe28-39ef-404e-b39e-d1dbd2e64de1@github.com> On Mon, 20 Mar 2023 19:47:10 GMT, Eirik Bjorsnos wrote: > Please review this PR which replaces the use of outdated JVM flags for setting up debugging in the test value004.java > > This is part of an ongoing effort to remove use of the outdated flag '-Djava.compiler" such that the option itself can eventually be removed. Marked as reviewed by alanb (Reviewer). test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java line 125: > 123: flg = true; > 124: ovl = argHandler.getLaunchExecPath() + " " > 125: +"-agentlib:jdwp=transport=" + c.transport().name() +",server=n,suspend=y," Good. Full speed debugging came in JDK 1.4. In JDK 1.5, -agentlib and -agentpath were added a standard options for starting agents on the command line so -Xdebug because redundant. I think -Xnoagent dates from Classic VM. ------------- PR Review: https://git.openjdk.org/jdk/pull/13107#pullrequestreview-1349827471 PR Review Comment: https://git.openjdk.org/jdk/pull/13107#discussion_r1142973202 From alanb at openjdk.org Tue Mar 21 07:31:51 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 21 Mar 2023 07:31:51 GMT Subject: RFR: 8304547: Remove checking of -Djava.compiler in src/jdk.jdi/share/classes/com/sun/tools/jdi/SunCommandLineLauncher.java In-Reply-To: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> References: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> Message-ID: On Mon, 20 Mar 2023 20:53:41 GMT, Eirik Bjorsnos wrote: > Please review this PR which removes the following outdated guard/check from SunCommandLineLauncher: > > > if ((options.indexOf("-Djava.compiler=") != -1) && > (options.toLowerCase().indexOf("-djava.compiler=none") == -1)) { > throw new IllegalConnectorArgumentsException("Cannot debug with a JIT compiler", > ARG_OPTIONS); > } > > > > Efforts are underway to remove the `java.compiler` system property entirely, besides that this test no longer makes sense since debugging with a JIT has been supported for a while. Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13109#pullrequestreview-1349829231 From duke at openjdk.org Tue Mar 21 08:25:44 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Tue, 21 Mar 2023 08:25:44 GMT Subject: RFR: 8304547: Remove checking of -Djava.compiler in src/jdk.jdi/share/classes/com/sun/tools/jdi/SunCommandLineLauncher.java In-Reply-To: References: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> Message-ID: On Tue, 21 Mar 2023 02:49:15 GMT, Chris Plummer wrote: > Please make sure you run the jdk/com/sun/jdi tests and the jdb, jdwp, and jdi test under hotspot/jtreg/vmTestbase/nsk Tests passed: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/jdk/com/sun/jdi 177 177 0 0 jtreg:test/hotspot/jtreg/vmTestbase/nsk/jdi 1142 1142 0 0 jtreg:test/hotspot/jtreg/vmTestbase/nsk/jdb 67 67 0 0 jtreg:test/hotspot/jtreg/vmTestbase/nsk/jdwp 113 113 0 0 ============================== ------------- PR Comment: https://git.openjdk.org/jdk/pull/13109#issuecomment-1477432327 From jsjolen at openjdk.org Tue Mar 21 09:57:52 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 Mar 2023 09:57:52 GMT Subject: RFR: JDK-8300245: Replace NULL with nullptr in share/jfr/ [v4] In-Reply-To: References: Message-ID: On Tue, 24 Jan 2023 21:10:31 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/jfr/. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> Note how nullptr participates in a code expression here, we really are talking about the specific value nullptr. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix outdated copyright Not yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12034#issuecomment-1477547098 From aph at openjdk.org Tue Mar 21 10:53:53 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 21 Mar 2023 10:53:53 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 14:29:35 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fix riscv interpreter mistake and acquire semantics src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 491: > 489: } else { > 490: // Pop N words from the stack > 491: __ get_cache_and_index_at_bcp(r1, r2, 1, index_size); This aliasing of `r1` and `cache` is very confusing. Please decide whether to use the name `cache` or `r1` and do so consistently. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2337: > 2335: // Load-acquire the adapter method > 2336: __ lea(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); > 2337: __ ldar(method, method); What reordering are we trying to prevent here? src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2399: > 2397: bool is_invokevirtual, > 2398: bool is_invokevfinal, /*unused*/ > 2399: bool is_invokedynamic /*unused*/) { Suggestion: bool /*is_invokedynamic*/) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143191698 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143193859 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143195547 From ron.pressler at oracle.com Tue Mar 21 12:40:03 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Tue, 21 Mar 2023 12:40:03 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> Message-ID: <4D0F2EC3-1EED-45AB-B2F7-86D4AFD8006A@oracle.com> Hi Volker. JEP 261 states: "The dynamic loading of JVM TI agents will be disabled by default in a future release. To prepare for that change we recommend that applications that allow dynamic agents start using the option -XX:+EnableDynamicAgentLoading to enable that loading explicitly." The purpose of my email was to announce that that change will be put into effect in JDK 21 and to give a final reminder to those who have not yet done so to follow the recommendation in JEP 261 to prepare for that change. The Project Jigsaw team made that decision at the time after considering the perspectives of performance experts, security experts, and tooling experts, but unless anyone has some *new* information to present, there is no point in replaying the same discussions. You can revisit at least some of the technical discussions on jigsaw-dev. I will summarise the salient aspects (all discussed at the time) in the forthcoming JEP but, briefly, dynamically loaded agents -- alongside JNI and Unsafe -- break integrity, the ability to guarantee certain invariants, which has various implications on performance, security, and code evolution. They don't always break integrity in the way a cursory contemplation would suggest, which is why you should study those discussions if you're interested in the subject. Since JEP 261, the JDK has been evolving under the assumption that integrity is preserved unless the application grants explicit consent for it to be broken. As far as security in particular is concerned, the point you made is *not* the relevant one to the implications that were considered by Project Jigsaw at the time. As a member of the Vulnerability Group you may want to discuss that particular aspect with the appropriate people. -- Ron > On 20 Mar 2023, at 12:16, Volker Simonis wrote: > > Hi Ron, > > I'm still missing convincing technical arguments for disallowing > dynamic loading of agents. > > If the argument is security then I can only agree with previous > answers in that an attacker needs local access with the same > credentials like the attacked JVM. But once he has that, all bets are > off anyway. > > If you plan for features/enhancements/optimizations that rely on not > being able to dynamically load an agent (which I haven't heard off > yet), I don't understand this change either. Because as long as a > switch for enabling dynamic loading exists (and I haven' heard that > you want to completely forbid it) the dynamic loading use case has to > be supported anyway. > > Dynamic agent loading is one of the features which sets the OpenJDK > apart from other languages, managed runtimes and even closely related > platforms like for example GraalVM Native Image which don't support > such a feature. The mere existence of tools which rely on it and > which are in widespread productive use, demonstrates its usefulness. > And it is always good to know you have this possibility in your > toolbox for the worst case (e.g. our log4j-hotpatcher [1]). > > I also can't by your argument that "the relatively few sophisticated > users who know how to write ad-hoc agents can even opt to enable > dynamic agent loading on all their servers". It is *exactly* not the > few sophisticated authors of dynamic agents who would need to enable > them but instead the millions of ingenious end-users and > administrators who bag for help once they run into trouble. The other > way round makes much more sense to me - the few sophisticated users > who know for sure that they will never need the help of dynamic agents > are free to disable them at startup. > > Given the current arguments, for me the usefulness of dynamic agents > outweigh their drawbacks by far. Of course every OpenJDK distributor > is free to change the default settings of command line options at his > sole discretion, but I don't currently see a compelling reason for > doing this by default for the whole OpenJDK community. If you have > future plans which rely on disabling/forbidding dynamic agents please > let us know. > > Best regards, > Volker > > [1] https://urldefense.com/v3/__https://aws.amazon.com/blogs/opensource/hotpatch-for-apache-log4j/__;!!ACWV5N9M2RV99hQ!I7QWWsAmQNvmFzektSGaq4lWBWuMxP5R8P6nSwxfugmyEpKOrd_Io64JBX9mD8PBHywYZ7gEbDumhe5MdiQz_QdvpQ$ > > On Mon, Mar 20, 2023 at 11:37?AM Jaroslav Bachorik wrote: >> >> Hi, >> >> On Mon, Mar 20, 2023 at 11:11?AM Ron Pressler wrote: >>> >>> Hi. >>> >>> The majority of serviceability tools don?t require dynamically loading an agent, and the majority of applications never load an agent dynamically. >> >> >> The majority of the JDK built-in tools, I would say. What about eg. the JMC agent? >> >>> >>> >>> True, there are some tools that will be affected, which is why the decision was to introduce the flag in JDK 9 and to announce this change, but change the default in a later version to give tools ample time to prepare their users. The rationale for this change then hasn?t changed, but will be reiterated in a JEP (we just wanted to announce this ahead of the JEP to give tool authors another reminder more than six months ahead of JDK 21). The only change between then and now is that even fewer use cases require dynamically loaded agents, and so the impact is even smaller. >> >> >> As a maintainer of one of such tools I can confidently say that this change will either kill the tool as the ease of use will be gone or the workaround (eg. using JAVA_TOOL_OPTIONS) will completely defeat the purpose of this change. Having to put a flag when starting the JVM to allow dynamic loading of agents sounds a bit nonsensical to me - it would be much easier to directly add the agent to the JVM startup and then implement a lightweight control protocol over socket/shared memory to enabled/disable the agent features dynamically. >> >>> >>> >>> It is also true that, when starting an application you don?t know that you *will* need to load an agent, but in most situations you know that you might. E.g. processes that are too critical to bring down even for deep maintenance (although not many of these are written in modern version of Java anyone) or canary services that are under trial. The relatively few sophisticated users who know how to write ad-hoc agents can even opt to enable dynamic agent loading on all their servers; these users are better equipped to can weigh the risks and tradeoffs involved. >> >> >> Wouldn't having this enabled system-wide actually defeat the purpose of having this flag? Considering that the dynamic attach can be performed only on the same host under the same user as the target process there seems to be a very small chance of loading agents accidentally. In the end people would set up their systems to enabled dynamic agent loading via eg. JAVA_TOOL_OPTIONS and we will be in the same place as before, with the additional hurdle of setting everything up. >> >>> >>> Finally, some tools that require a dynamically loaded JVM TI agents, such as profilers that profile native code, are so tied to the VM's internals that the best place for them is in the JDK. If anything, the bigger problem is not that profilers are used too much in production, but too little, including less advanced ones that don?t require an agent. There is plenty of time to enhance the JDK?s built-in profiling capabilities ahead of demand. >> >> >> I think this is an overly optimistic view. It is *much more* difficult to enhance the JDK's built-in profiling capabilities than do the same in an external profiling agent. >> >> >> Overall, I don't seem to understand the anticipated attack vectors this change is supposed to prevent. AFAIK, in order to do the dynamic agent load one needs to have full access to the target process. That means that there are more convenient and straightforward ways to do anything nefarious than loading a JVMTI agent. Am I missing some other usages where the JVMTI agent would actually give access to something which would be otherwise inaccessible considering that the attacher and attachee must be on the same host and under the same user? >> >> Cheers, >> >> -JB- >> >>> >>> >>> ? Ron >>> >>> On 20 Mar 2023, at 01:21, Andrei Pangin wrote: >>> >>> Hi all, >>> >>> Serviceability has been one of the biggest Java strengths, but the proposed change is going to have a large negative impact on it. >>> >>> Disallowing dynamic agents by default means it will no longer be possible to attach a profiler to a running app in runtime. JFR cannot close this gap due to lack of capabilities modern Java profilers have (that's a separate topic though). >>> >>> When an issue happens with a live app, it's already too late to add a command line argument. Furthermore, it may not be even feasible to add an agent at startup in containerized applications. Starting profiler on demand from the host OS or from a sidecar is the only viable solution in these cases. >>> >>> Next, it's hard to predict beforehand what tools exactly might be useful for troubleshooting: e.g., one tool may be better for finding memory leaks, a different one for analyzing CPU performance. Adding all possible tools at startup does not seem a reasonable approach, especially when tools may conflict with each other. >>> >>> The most important aspect of dynamic agents is the possibility to make a special tool just in time for solving a particular problem. A typical example is to get a value of some field in a live app without dumping the entire 60 GB heap. Another common use case is hot patching for fixing trivial bugs or for adding debug logs dynamically. The prominent example is when the dynamic agent has proved irreplaceable aid in addressing the notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046. >>> >>> I would be grateful to know more about the reasons why we should give up all the above advantages of dynamic agents in their good and legitimate use cases. >>> >>> Thank you, >>> Andrei >>> >>> ??, 16 ???. 2023??. ? 18:48, Ron Pressler : >>>> >>>> Hi. >>>> >>>> In JDK 21 we intend to disallow the dynamic loading of agents by default. This >>>> will affect tools that use the Attach API to load an agent into a JVM some time >>>> after the JVM has started [1]. There is no change to any of the mechanisms that >>>> load an agent at JVM startup (-javaagent/-agentlib on the command line or the >>>> Launcher-Agent-Class attribute in the main JAR's manifest). >>>> >>>> This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. >>>> At that time the consensus was to switch to this default not in JDK 9 but in a >>>> later release to give tool maintainers sufficient time to inform their users. >>>> To allow the dynamic loading of agents, users will need to specify >>>> -XX:+EnableDynamicAgentLoading on the command line. >>>> >>>> I'll post a draft JEP for review shortly. >>>> >>>> -- Ron >>>> >>>> [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html >>>> [2]: https://openjdk.org/jeps/261 >>>> [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html >>> >>> From matsaave at openjdk.org Tue Mar 21 14:49:54 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 21 Mar 2023 14:49:54 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 10:51:08 GMT, Andrew Haley wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix riscv interpreter mistake and acquire semantics > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2399: > >> 2397: bool is_invokevirtual, >> 2398: bool is_invokevfinal, /*unused*/ >> 2399: bool is_invokedynamic /*unused*/) { > > Suggestion: > > bool /*is_invokedynamic*/) { This is a temporary bandaid in the same format we see for `is_invokefinal`, both of which should be cleaned up once all the ports are complete. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143508140 From rrich at openjdk.org Tue Mar 21 17:43:57 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Mar 2023 17:43:57 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 10:49:32 GMT, Andrew Haley wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix riscv interpreter mistake and acquire semantics > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2337: > >> 2335: // Load-acquire the adapter method >> 2336: __ lea(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); >> 2337: __ ldar(method, method); > > What reordering are we trying to prevent here? The loads of the data stored in `ResolvedIndyEntry::fill_in()` must not be reordered with loading `ResolvedIndyEntry::_method`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143781928 From cjplummer at openjdk.org Tue Mar 21 18:03:50 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 Mar 2023 18:03:50 GMT Subject: Integrated: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" In-Reply-To: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> References: <8K44GQWCzNKVse0bGevIGWxoE1GErrhPPdF6BDgNklU=.5ee990be-1e65-4e71-a5c6-ab3918ee4bd0@github.com> Message-ID: On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. This pull request has now been integrated. Changeset: 0deb6489 Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/0deb648985b018653ccdaf193dc13b3cf21c088a Stats: 20 lines in 2 files changed: 12 ins; 2 del; 6 mod 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung" Reviewed-by: amenkov, lmesnik, sspitsyn, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/13068 From matsaave at openjdk.org Tue Mar 21 20:01:44 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 21 Mar 2023 20:01:44 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v10] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Consistent register naming for aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/8607f62a..cbe4fdcb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=08-09 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From liach at openjdk.org Tue Mar 21 21:39:48 2023 From: liach at openjdk.org (Chen Liang) Date: Tue, 21 Mar 2023 21:39:48 GMT Subject: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v7] In-Reply-To: References: Message-ID: <3PN6riy9fxQHRz1646Xi5fo-V7pTeF_r4Ojgt3yMRtg=.eb0d3145-3d0c-4c3e-b158-d836f7460110@github.com> > Summaries: > 1. A few recommendations about updating the constant API is made at https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Switch to ConstantDescs for and void constants - Merge AnnotationsTest, remove ModuleTargetAttribute call - Merge branch 'invoke-test-classfile' of https://github.com/liachmodded/jdk into invoke-test-classfile - Update test/jdk/java/lang/invoke/8022701/MHIllegalAccess.java Co-authored-by: Andrey Turbanov - Merge branch 'master' into invoke-test-classfile - Fix LambdaStackTrace after running - formatting - Fix failed LambdaStackTrace test, use more convenient APIs - Merge branch 'master' of https://git.openjdk.java.net/jdk into invoke-test-classfile - Shorten lines, move from mask() to ACC_ constants, other misc improvements - ... and 1 more: https://git.openjdk.org/jdk/compare/0deb6489...7dc785b3 ------------- Changes: https://git.openjdk.org/jdk/pull/13009/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=06 Stats: 1946 lines in 31 files changed: 302 ins; 889 del; 755 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009 From cjplummer at openjdk.org Tue Mar 21 21:47:37 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 Mar 2023 21:47:37 GMT Subject: RFR: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC Message-ID: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> There are two GC related issues with this test that are being addressed. The test was limiting the heap size to 6m so if there is still a leak, it will be detected quickly. This proved to be too small of a size when using ZGC. For the most part changing the size to 7m fixed this issue. However, I was still seeing frequent issues with ZGC on macOS. This is explained by [JDK-8304449](https://bugs.openjdk.org/browse/JDK-8304449), which noticed (rarely) OOME on macos even when not using ZGC. From JDK-8304449: "macOS has a thread behavior that is not seen on linux and windows that is causing more memory usage, which sometimes leads to this unexpected OOME. The debuggee side of the test constantly creates threads that do little more than a short sleep. It has a counter of "live" threads, and won't let that go over 500. On the debugger side it is just tracking ThreadStartEvents and ThreadDeathEvents. It keep tracks of threads (ThreadReferences) for which a ThreadStartEvent had been received but a ThreadDeathEvent has not. On linux and windows the count of outstanding threads is generally in the 200-400 range, sometimes briefly going over 500. However, on macOS it is closer to 2400. This means a lot more ThreadReferences being tracked, which means more memory usage, so sometimes you see an OOME on macOS as a result. " The threads collection mainly existed just so its size could be used to log the number of outstanding ThreadDeathEvents. I got rid of the threads collection and instead am just tracking the number of ThreadStartEvents and ThreadDeathEvents, and computing the difference to get the number of outstanding ThreadDeathEvents. ------------- Commit messages: - update problem list - Fix GC related issues Changes: https://git.openjdk.org/jdk/pull/13130/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13130&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304436 Stats: 16 lines in 2 files changed: 0 ins; 6 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13130.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13130/head:pull/13130 PR: https://git.openjdk.org/jdk/pull/13130 From aph at openjdk.org Tue Mar 21 22:24:51 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 21 Mar 2023 22:24:51 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 17:40:46 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2337: >> >>> 2335: // Load-acquire the adapter method >>> 2336: __ lea(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); >>> 2337: __ ldar(method, method); >> >> What reordering are we trying to prevent here? > > The loads of the data stored in `ResolvedIndyEntry::fill_in()` must not be reordered with loading `ResolvedIndyEntry::_method`. Ah, I see. This acquiring load matches the releasing store in `fill_in()`. Please add a comment here to that effect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1144047014 From cjplummer at openjdk.org Tue Mar 21 22:38:18 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 Mar 2023 22:38:18 GMT Subject: RFR: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC [v2] In-Reply-To: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> References: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> Message-ID: > There are two GC related issues with this test that are being addressed. The test was limiting the heap size to 6m so if there is still a leak, it will be detected quickly. This proved to be too small of a size when using ZGC. For the most part changing the size to 7m fixed this issue. However, I was still seeing frequent issues with ZGC on macOS. This is explained by [JDK-8304449](https://bugs.openjdk.org/browse/JDK-8304449), which noticed (rarely) OOME on macos even when not using ZGC. From JDK-8304449: > > "macOS has a thread behavior that is not seen on linux and windows that is causing more memory usage, which sometimes leads to this unexpected OOME. The debuggee side of the test constantly creates threads that do little more than a short sleep. It has a counter of "live" threads, and won't let that go over 500. On the debugger side it is just tracking ThreadStartEvents and ThreadDeathEvents. It keep tracks of threads (ThreadReferences) for which a ThreadStartEvent had been received but a ThreadDeathEvent has not. On linux and windows the count of outstanding threads is generally in the 200-400 range, sometimes briefly going over 500. However, on macOS it is closer to 2400. This means a lot more ThreadReferences being tracked, which means more memory usage, so sometimes you see an OOME on macOS as a result. " > > The `threads` collection mainly existed just so its size could be used to log the number of outstanding ThreadDeathEvents. I got rid of the `threads` collection and instead am just tracking the number of ThreadStartEvents and ThreadDeathEvents, and computing the difference to get the number of outstanding ThreadDeathEvents. Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: get rid of some locals that are not needed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13130/files - new: https://git.openjdk.org/jdk/pull/13130/files/2a2efb13..89f73875 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13130&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13130&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13130.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13130/head:pull/13130 PR: https://git.openjdk.org/jdk/pull/13130 From kvn at openjdk.org Wed Mar 22 01:19:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 Mar 2023 01:19:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 First, thank you for putting new code under flag. I looked only on x86 code. It seems fine except few places where I added comment. I wish locking code for Aarch64 and Risc-v was moved to c2_MacroAssembler as on x86 but for this review it is better to keep it where it is. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 602: > 600: movptr(tmpReg, Address(objReg, oopDesc::mark_offset_in_bytes())); // [FETCH] > 601: testptr(tmpReg, markWord::monitor_value); // inflated vs stack-locked|neutral > 602: jccb(Assembler::notZero, IsInflated); May be use `jcc` long branch here to be safe. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 670: > 668: get_thread (scrReg); // beware: clobbers ICCs > 669: movptr(Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), scrReg); > 670: xorptr(boxReg, boxReg); // set icc.ZFlag = 1 to indicate success Should this be under `if (UseFastLocking)`? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 791: > 789: Compile::current()->output()->add_stub(stub); > 790: jcc(Assembler::notEqual, stub->entry()); > 791: bind(stub->continuation()); Why use stub here and not inline the code? Because the branch mostly not taken? ------------- PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1351551552 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144107482 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144111315 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144119776 From sspitsyn at openjdk.org Wed Mar 22 02:20:40 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 02:20:40 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 Message-ID: The fix is to enable support for late binding JVMTI agents. The fix includes: - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` Testing: - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` - The originally failed tests are expected to pass now: `runtime/vthread/RedefineClass.java` `runtime/vthread/TestObjectAllocationSampleEvent.java` - In progress: Run the tiers 1-6 to make sure there are no regression. ------------- Commit messages: - cleanup in vmOperation.hpp - restored one incorrectly removed function - removed temporary debugging changes - 8297286: runtime/vthread tests crashing after JDK-8296324 Changes: https://git.openjdk.org/jdk/pull/13133/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297286 Stats: 380 lines in 9 files changed: 377 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Wed Mar 22 03:03:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 03:03:18 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: References: Message-ID: > The fix is to enable support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: update for missed part in jvmtiExport.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/6e27ee6f..ddac01c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=00-01 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Wed Mar 22 06:06:25 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 06:06:25 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v3] In-Reply-To: References: Message-ID: > The fix is to enable support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: add necessary tweak in jvmtiExport.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/ddac01c3..51c3f7d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=01-02 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From dholmes at openjdk.org Wed Mar 22 09:22:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Mar 2023 09:22:00 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 00:53:31 GMT, Serguei Spitsyn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> more cleanup > > src/hotspot/share/prims/jvmtiEnvBase.hpp line 166: > >> 164: >> 165: const void* get_env_local_storage() { return _env_local_storage; } >> 166: > > Why was this change/move necessary? Do I miss anything? It is now public, not protected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1144458026 From rkennke at openjdk.org Wed Mar 22 09:51:08 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Mar 2023 09:51:08 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Wed, 22 Mar 2023 00:25:43 GMT, Vladimir Kozlov wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 670: > >> 668: get_thread (scrReg); // beware: clobbers ICCs >> 669: movptr(Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), scrReg); >> 670: xorptr(boxReg, boxReg); // set icc.ZFlag = 1 to indicate success > > Should this be under `if (UseFastLocking)`? I don't think so, unless we also want to change all the stuff in x86_32.ad to not fetch the thread before calling into fast_unlock(). However, I think it is a nice and useful change. I could break it out of this PR and get it reviewed separately, it is a side-effect of the new locking impl insofar as we always require a thread register, and allocate&fetch it before going into fast_lock(). > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 791: > >> 789: Compile::current()->output()->add_stub(stub); >> 790: jcc(Assembler::notEqual, stub->entry()); >> 791: bind(stub->continuation()); > > Why use stub here and not inline the code? Because the branch mostly not taken? Yes, the branch is mostly not taken. If we inline the code, then we would have to take a forward branch on the very common path to skip over the (rare) part that handles ANON monitor owner. This would throw off static branch prediction and is discouraged by the Intel optimization guide. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144501909 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144504528 From jbechberger at openjdk.org Wed Mar 22 16:20:57 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 22 Mar 2023 16:20:57 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 Message-ID: Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. Tested on my M1 mac. ------------- Commit messages: - Fix 8304725 Changes: https://git.openjdk.org/jdk/pull/13144/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304725 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13144/head:pull/13144 PR: https://git.openjdk.org/jdk/pull/13144 From matsaave at openjdk.org Wed Mar 22 17:42:35 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 22 Mar 2023 17:42:35 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v11] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Improved comment for load-acquire aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/cbe4fdcb..a4714f54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From kvn at openjdk.org Wed Mar 22 17:50:19 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 Mar 2023 17:50:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Wed, 22 Mar 2023 09:47:47 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 791: >> >>> 789: Compile::current()->output()->add_stub(stub); >>> 790: jcc(Assembler::notEqual, stub->entry()); >>> 791: bind(stub->continuation()); >> >> Why use stub here and not inline the code? Because the branch mostly not taken? > > Yes, the branch is mostly not taken. If we inline the code, then we would have to take a forward branch on the very common path to skip over the (rare) part that handles ANON monitor owner. This would throw off static branch prediction and is discouraged by the Intel optimization guide. okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1145202521 From kvn at openjdk.org Wed Mar 22 18:08:52 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 Mar 2023 18:08:52 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Wed, 22 Mar 2023 09:46:04 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 670: >> >>> 668: get_thread (scrReg); // beware: clobbers ICCs >>> 669: movptr(Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), scrReg); >>> 670: xorptr(boxReg, boxReg); // set icc.ZFlag = 1 to indicate success >> >> Should this be under `if (UseFastLocking)`? > > I don't think so, unless we also want to change all the stuff in x86_32.ad to not fetch the thread before calling into fast_unlock(). However, I think it is a nice and useful change. I could break it out of this PR and get it reviewed separately, it is a side-effect of the new locking impl insofar as we always require a thread register, and allocate&fetch it before going into fast_lock(). I missed that it is under #ifndef LP64. Yes, it make since since you are now passing `thread` in register. And why we need to `get_thread()` at line 708 if we already have it? It is still hard to follow this 32-bit code. What each register is holding, what is value `3` and why we don't have checks similar to LP64 code after CAS? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1145223128 From lmesnik at openjdk.org Wed Mar 22 18:28:41 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 22 Mar 2023 18:28:41 GMT Subject: Integrated: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 23:54:13 GMT, Leonid Mesnik wrote: > The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. > > The simplest fix to avoid this is just to rename class. This pull request has now been integrated. Changeset: e73411a2 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/e73411a2354cf266ab7a5ddadfb6ea98d7eb4cd1 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE Reviewed-by: sspitsyn, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/13069 From sspitsyn at openjdk.org Wed Mar 22 19:06:34 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 19:06:34 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: References: Message-ID: > The fix is to enable support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor cleanup in enable_virtual_threads_notify_jvmti() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/51c3f7d5..52ef33b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From cjplummer at openjdk.org Wed Mar 22 21:14:46 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 22 Mar 2023 21:14:46 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: References: Message-ID: <-sHjr_T15KCztTKCCOiaqfURgQzQ4v3FMSQXgSwpo8s=.e9a49ce0-cbf8-496c-99db-db7bd3f74445@github.com> On Wed, 22 Mar 2023 19:06:34 GMT, Serguei Spitsyn wrote: >> The fix is to enable support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor cleanup in enable_virtual_threads_notify_jvmti() > The fix is to enable support for late binding JVMTI agents. Just to be clear, the last binding support was already there, but wasn't working properly with virtual threads. Correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1480265452 From cjplummer at openjdk.org Wed Mar 22 21:39:44 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 22 Mar 2023 21:39:44 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 19:06:34 GMT, Serguei Spitsyn wrote: >> The fix is to enable support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor cleanup in enable_virtual_threads_notify_jvmti() src/hotspot/share/prims/jvmtiExport.cpp line 389: > 387: } else { > 388: JvmtiVTMSTransitionDisabler::set_VTMS_notify_jvmti_events(true); > 389: } One thing that is a little bit confusing about the changes here is that this is where the fix for [JDK-8296324](https://bugs.openjdk.org/browse/JDK-8296324) went, but for some reason the fix for [JDK-8304303](https://bugs.openjdk.org/browse/JDK-8304303), which was pushed a couple of days ago, removed the [JDK-8296324](https://bugs.openjdk.org/browse/JDK-8296324) changes even though these two bugs don't seem related. The code added by [JDK-8296324](https://bugs.openjdk.org/browse/JDK-8296324) was: ``` if (Continuations::enabled()) { // Virtual threads support. There is a performance impact when VTMS transitions are enabled. java_lang_VirtualThread::set_notify_jvmti_events(true); + if (JvmtiEnv::get_phase() == JVMTI_PHASE_LIVE) { + ThreadInVMfromNative __tiv(JavaThread::current()); + java_lang_VirtualThread::init_static_notify_jvmti_events(); + } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1145434016 From sspitsyn at openjdk.org Wed Mar 22 21:50:43 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 21:50:43 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: <-sHjr_T15KCztTKCCOiaqfURgQzQ4v3FMSQXgSwpo8s=.e9a49ce0-cbf8-496c-99db-db7bd3f74445@github.com> References: <-sHjr_T15KCztTKCCOiaqfURgQzQ4v3FMSQXgSwpo8s=.e9a49ce0-cbf8-496c-99db-db7bd3f74445@github.com> Message-ID: On Wed, 22 Mar 2023 21:11:45 GMT, Chris Plummer wrote: > > The fix is to enable support for late binding JVMTI agents. > Just to be clear, the late binding support was already there, but wasn't working properly with virtual threads. Correct? Nice catch. Right, it is virtual threads support. Fixed description now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1480303929 From sspitsyn at openjdk.org Wed Mar 22 21:58:43 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 21:58:43 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 21:36:57 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> minor cleanup in enable_virtual_threads_notify_jvmti() > > src/hotspot/share/prims/jvmtiExport.cpp line 389: > >> 387: } else { >> 388: JvmtiVTMSTransitionDisabler::set_VTMS_notify_jvmti_events(true); >> 389: } > > One thing that is a little bit confusing about the changes here is that this is where the fix for [JDK-8296324](https://bugs.openjdk.org/browse/JDK-8296324) went, but for some reason the fix for [JDK-8304303](https://bugs.openjdk.org/browse/JDK-8304303), which was pushed a couple of days ago, removed the [JDK-8296324](https://bugs.openjdk.org/browse/JDK-8296324) changes even though these two bugs don't seem related. The code added by [JDK-8296324](https://bugs.openjdk.org/browse/JDK-8296324) was: > > ``` if (Continuations::enabled()) { > // Virtual threads support. There is a performance impact when VTMS transitions are enabled. > java_lang_VirtualThread::set_notify_jvmti_events(true); > + if (JvmtiEnv::get_phase() == JVMTI_PHASE_LIVE) { > + ThreadInVMfromNative __tiv(JavaThread::current()); > + java_lang_VirtualThread::init_static_notify_jvmti_events(); > + } > } I agree, it is confusing. This fragment was removed by JDK-8304303 because the function `java_lang_VirtualThread::init_static_notify_jvmti_events()` was being removed. It did not work correctly in the first place. It is why the JDK-8297286 was filed. Also, it was on my plan to fix JDK-8297286 soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1145449155 From cjplummer at openjdk.org Thu Mar 23 02:05:06 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 23 Mar 2023 02:05:06 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 19:06:34 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor cleanup in enable_virtual_threads_notify_jvmti() src/hotspot/share/prims/jvmtiEnvBase.hpp line 87: > 85: > 86: static bool enable_virtual_threads_notify_jvmti(); > 87: static bool disable_virtual_threads_notify_jvmti(); "disable" only seems to be used by the WB API. Is that expected? src/hotspot/share/prims/jvmtiThreadState.hpp line 102: > 100: > 101: static int VTMS_transition_count() { return _VTMS_transition_count; } > 102: static void set_VTMS_transition_count(bool val) { _VTMS_transition_count = val; } Although there is a call to `set_VTMS_transition_count()`, I don't see any calls to `VTMS_transition_count()`. Are these really needed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1145589024 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1145589515 From dholmes at openjdk.org Thu Mar 23 02:20:40 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Mar 2023 02:20:40 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 15:57:40 GMT, Johannes Bechberger wrote: > Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. > > Tested on my M1 mac. Seems okay. But I am really over this game of whack-a-mole with ThreadWXEnable. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13144#pullrequestreview-1353732847 From sspitsyn at openjdk.org Thu Mar 23 05:46:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 23 Mar 2023 05:46:42 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: References: Message-ID: <94iP0mDhHYO_Ri-oeGEfd7rdzuSKrdaPg_PzzTyPRfs=.f36050a9-1e54-4b57-9d75-94d03b73c71a@github.com> On Thu, 23 Mar 2023 02:00:56 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> minor cleanup in enable_virtual_threads_notify_jvmti() > > src/hotspot/share/prims/jvmtiEnvBase.hpp line 87: > >> 85: >> 86: static bool enable_virtual_threads_notify_jvmti(); >> 87: static bool disable_virtual_threads_notify_jvmti(); > > "disable" only seems to be used by the WB API. Is that expected? Yes, this is from the PR description: > New function JvmtiEnvBase::disable_virtual_threads_notify_jvmti() which is needed for testing. > It is used by the WhiteBox API. Do you think a comment is needed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1145702749 From sspitsyn at openjdk.org Thu Mar 23 05:54:29 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 23 Mar 2023 05:54:29 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: Message-ID: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: address review comment: remove unneeded function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/52ef33b6..89b659d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Thu Mar 23 05:54:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 23 Mar 2023 05:54:33 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 02:02:01 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> minor cleanup in enable_virtual_threads_notify_jvmti() > > src/hotspot/share/prims/jvmtiThreadState.hpp line 102: > >> 100: >> 101: static int VTMS_transition_count() { return _VTMS_transition_count; } >> 102: static void set_VTMS_transition_count(bool val) { _VTMS_transition_count = val; } > > Although there is a call to `set_VTMS_transition_count()`, I don't see any calls to `VTMS_transition_count()`. Are these really needed? Thanks. It was used in an assert that I recently removed as not important. Will remove it as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1145704755 From jbechberger at openjdk.org Thu Mar 23 06:08:40 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 06:08:40 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 02:17:44 GMT, David Holmes wrote: > Seems okay. But I am really over this game of whack-a-mole with ThreadWXEnable. You're not the only one. Andrei Pangin added a workaround to async-profiler, so nobody noticed the problem before. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480648254 From cjplummer at openjdk.org Thu Mar 23 06:12:42 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 23 Mar 2023 06:12:42 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: <94iP0mDhHYO_Ri-oeGEfd7rdzuSKrdaPg_PzzTyPRfs=.f36050a9-1e54-4b57-9d75-94d03b73c71a@github.com> References: <94iP0mDhHYO_Ri-oeGEfd7rdzuSKrdaPg_PzzTyPRfs=.f36050a9-1e54-4b57-9d75-94d03b73c71a@github.com> Message-ID: On Thu, 23 Mar 2023 05:43:35 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.hpp line 87: >> >>> 85: >>> 86: static bool enable_virtual_threads_notify_jvmti(); >>> 87: static bool disable_virtual_threads_notify_jvmti(); >> >> "disable" only seems to be used by the WB API. Is that expected? > > Yes, this is from the PR description: >> New function JvmtiEnvBase::disable_virtual_threads_notify_jvmti() which is needed for testing. >> It is used by the WhiteBox API. > > Do you think a comment is needed? I was wondering why we would be testing something that can't ever be triggered by a regular java program. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1145716490 From gcao at openjdk.org Thu Mar 23 06:21:55 2023 From: gcao at openjdk.org (Gui Cao) Date: Thu, 23 Mar 2023 06:21:55 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v11] In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 17:42:35 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved comment for load-acquire aarch64 Looks good for me. tier1, no new errors introduced, please wait for me to finish the tier2 and tier3 tests. ------------- PR Review: https://git.openjdk.org/jdk/pull/12778#pullrequestreview-1353916818 From stuefe at openjdk.org Thu Mar 23 07:06:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Mar 2023 07:06:43 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 15:57:40 GMT, Johannes Bechberger wrote: > Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. > > Tested on my M1 mac. Okay. Though the prospect of async profiler modifying the code cache by walking the stack seems scary. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13144#pullrequestreview-1353962184 From stuefe at openjdk.org Thu Mar 23 07:06:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Mar 2023 07:06:44 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 In-Reply-To: References: Message-ID: <3yjtW4MgHDAZQEQPcPOTFQdQm5rJ6QHQnDMLmXavVUo=.628eb73e-89d3-4926-ace7-35f57773fed1@github.com> On Thu, 23 Mar 2023 06:05:33 GMT, Johannes Bechberger wrote: >> Seems okay. But I am really over this game of whack-a-mole with ThreadWXEnable. > >> Seems okay. But I am really over this game of whack-a-mole with ThreadWXEnable. > > You're not the only one. > > Andrei Pangin added a workaround to async-profiler, so nobody noticed the problem before. @parttimenerd I think you need two reviews in hotspot for integration. You now effectively disable execution of generated code for the whole extend of AGCT? So we cannot call stub routines anymore. Won't hurt safefetch since it uses static assembly now, but something to keep in mind if you ever want to downport this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480693731 From eosterlund at openjdk.org Thu Mar 23 07:09:40 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 23 Mar 2023 07:09:40 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 15:57:40 GMT, Johannes Bechberger wrote: > Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. > > Tested on my M1 mac. Drive by comment: how async safe is WX enabler? If a thread is in the middle of it and we shoot a signal and enable, what will happen? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480697013 From jbechberger at openjdk.org Thu Mar 23 07:29:45 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 07:29:45 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 In-Reply-To: <3yjtW4MgHDAZQEQPcPOTFQdQm5rJ6QHQnDMLmXavVUo=.628eb73e-89d3-4926-ace7-35f57773fed1@github.com> References: <3yjtW4MgHDAZQEQPcPOTFQdQm5rJ6QHQnDMLmXavVUo=.628eb73e-89d3-4926-ace7-35f57773fed1@github.com> Message-ID: On Thu, 23 Mar 2023 07:03:04 GMT, Thomas Stuefe wrote: > You now effectively disable execution of generated code for the whole extend of AGCT? That's exactly what async-profiler does already https://github.com/async-profiler/async-profiler/blob/c8de91df6b090af82e91a066deb81a3afb505331/src/profiler.cpp#L383, I wonder why we don't have problems there with SafeFetch on older JVMs. > Okay. Though the prospect of async profiler modifying the code cache by walking the stack seems scary. We discussed it before. It's probably safe, but yes, my initial reaction was also "why not just remove this", but it should have a performance impact. > Drive by comment: how async safe is WX enabler? If a thread is in the middle of it and we shoot a signal and enable, what will happen? This is a really good point. I think that making the field that stores this information `volatile` could alleviate the problem (?). This would ensure that no reordering takes place in: _wx_state = WXWrite; os::current_thread_enable_wx(_wx_state); (https://github.com/openjdk/jdk/blob/77cd917a97b184871ab2d3325ceb6c53afeca28b/src/hotspot/share/runtime/thread.inline.hpp#L78) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480713645 From jbechberger at openjdk.org Thu Mar 23 07:52:24 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 07:52:24 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v2] In-Reply-To: References: Message-ID: > Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. > > Tested on my M1 mac. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Make _wx_state volatile ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13144/files - new: https://git.openjdk.org/jdk/pull/13144/files/468d71e3..8af25ec0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13144/head:pull/13144 PR: https://git.openjdk.org/jdk/pull/13144 From jbechberger at openjdk.org Thu Mar 23 08:20:18 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 08:20:18 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: > Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. > > Tested on my M1 mac. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Use raw_thread instead of Thread::current() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13144/files - new: https://git.openjdk.org/jdk/pull/13144/files/8af25ec0..22b661dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13144/head:pull/13144 PR: https://git.openjdk.org/jdk/pull/13144 From jbechberger at openjdk.org Thu Mar 23 08:29:44 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 08:29:44 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:20:18 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use raw_thread instead of Thread::current() The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480772888 From eosterlund at openjdk.org Thu Mar 23 08:26:42 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 23 Mar 2023 08:26:42 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: <3yjtW4MgHDAZQEQPcPOTFQdQm5rJ6QHQnDMLmXavVUo=.628eb73e-89d3-4926-ace7-35f57773fed1@github.com> Message-ID: <8ygk_PGSqYUnmW9gxt5bg83dQJo1E1IPaw-TcMQPOio=.1ae61a0f-2d9b-4ba1-999a-7751b49bc578@github.com> On Thu, 23 Mar 2023 07:27:14 GMT, Johannes Bechberger wrote: > > You now effectively disable execution of generated code for the whole extend of AGCT? > > > > That's exactly what async-profiler does already https://github.com/async-profiler/async-profiler/blob/c8de91df6b090af82e91a066deb81a3afb505331/src/profiler.cpp#L383, I wonder why we don't have problems there with SafeFetch on older JVMs. > > > > > Okay. Though the prospect of async profiler modifying the code cache by walking the stack seems scary. > > > > We discussed it before. It's probably safe, but yes, my initial reaction was also "why not just remove this", but it should have a performance impact. > > > > > Drive by comment: how async safe is WX enabler? If a thread is in the middle of it and we shoot a signal and enable, what will happen? > > > > This is a really good point. I think that making the field that stores this information `volatile` could alleviate the problem (?). This would ensure that no reordering takes place in: > > > > ``` > > _wx_state = WXWrite; > > os::current_thread_enable_wx(_wx_state); > > ``` > > (https://github.com/openjdk/jdk/blob/77cd917a97b184871ab2d3325ceb6c53afeca28b/src/hotspot/share/runtime/thread.inline.hpp#L78) If you look at the enable_wx function, there is a lack of atomicity of the operation. It checks the current state and only conditionally decides to change the state. And when it does there is a write of the new state and a call to actually flip the mode. Seems to me that there could be many problematic interleavings where the signal hits in this code, which could mess things up. This code was not designed to be async safe, and I'm not convinced that sprinkling in volatile really solves that problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480769232 From sspitsyn at openjdk.org Thu Mar 23 08:35:40 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 23 Mar 2023 08:35:40 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v4] In-Reply-To: References: <94iP0mDhHYO_Ri-oeGEfd7rdzuSKrdaPg_PzzTyPRfs=.f36050a9-1e54-4b57-9d75-94d03b73c71a@github.com> Message-ID: <2vKOdfA_ANscVsmo8hYjnooncmAQjPc81Y7UXrsl_eo=.f919cd8d-667c-4ae2-bdaa-3964bfcfd5f0@github.com> On Thu, 23 Mar 2023 06:09:42 GMT, Chris Plummer wrote: >> Yes, this is from the PR description: >>> New function JvmtiEnvBase::disable_virtual_threads_notify_jvmti() which is needed for testing. >>> It is used by the WhiteBox API. >> >> Do you think a comment is needed? > > I was wondering why we would be testing something that can't ever be triggered by a regular java program. The function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` is needed to test the `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()`. To better test enabling we need to run it multiple times, so disabling is needed as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1145839021 From jbechberger at openjdk.org Thu Mar 23 08:44:42 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 08:44:42 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:20:18 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use raw_thread instead of Thread::current() Or even better: We could disable the caching if we're not in the write-enabled mode. This way we don't need to change any other code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480789489 From eosterlund at openjdk.org Thu Mar 23 08:39:44 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 23 Mar 2023 08:39:44 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:27:23 GMT, Johannes Bechberger wrote: > The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). Sounds like a plan. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480784139 From jbechberger at openjdk.org Thu Mar 23 08:50:45 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 08:50:45 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:20:18 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use raw_thread instead of Thread::current() We could just add a method `is_writable()` at https://github.com/openjdk/jdk/blob/8af25ec09d47900998b8fb58e2e7d486420df03d/src/hotspot/share/runtime/thread.hpp#L629 and call it in PcDesc code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480797638 From jbechberger at openjdk.org Thu Mar 23 09:36:43 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 09:36:43 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:36:59 GMT, Erik ?sterlund wrote: > > The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). > > Sounds like a plan. I'm going to write some code to check the performance impact of disabling it, on Mac and Linux. So we could maybe disable the cache modification on all platforms, which reduces the potential of bugs. This should be in the spirit of @tstuefe's comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480867620 From stuefe at openjdk.org Thu Mar 23 09:44:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Mar 2023 09:44:43 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:36:59 GMT, Erik ?sterlund wrote: >> The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). > >> The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). > > Sounds like a plan. Reading @fisk excellent catch the async safety of stacking the wx raii mechanics, I retract my approval. This looks like a recipe for hard-to-find bugs :-/ Incidentally, do we see in the hs_err file whether async profiler is attached? We should maybe make that prominently visible. > > The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). > > Sounds like a plan. I feel there are information missing :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480878975 From jbechberger at openjdk.org Thu Mar 23 09:48:46 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 09:48:46 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:36:59 GMT, Erik ?sterlund wrote: >> The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). > >> The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). > > Sounds like a plan. > Reading @fisk excellent catch the async safety of stacking the wx raii mechanics, I retract my approval. This looks like a recipe for hard-to-find bugs :-/ Yes, this is my current thought too. > Incidentally, do we see in the hs_err file whether async profiler is attached? We should maybe make that prominently visible. We see ASGCT in the stack trace, but what exactly do you mean. > I feel there are information missing :) But would you be okay with disabling the cache modification if the perf impact is low? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1480883378 From aph at openjdk.org Thu Mar 23 10:46:56 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 23 Mar 2023 10:46:56 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v11] In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 17:42:35 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved comment for load-acquire aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2368: > 2366: __ tst(index, index); > 2367: __ br(Assembler::EQ, L_no_push); > 2368: Maybe I don't understand what this should do? Suggestion: __ tbz(index, ResolvedIndyEntry::has_appendix_shift, L_no_push); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1145995182 From stuefe at openjdk.org Thu Mar 23 11:49:48 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Mar 2023 11:49:48 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:36:59 GMT, Erik ?sterlund wrote: >> The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). > >> The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac). > > Sounds like a plan. > > Reading @fisk excellent catch the async safety of stacking the wx raii mechanics, I retract my approval. This looks like a recipe for hard-to-find bugs :-/ > > Yes, this is my current thought too. > > > Incidentally, do we see in the hs_err file whether async profiler is attached? We should maybe make that prominently visible. > > We see ASGCT in the stack trace, but what exactly do you mean. > For issues like this, you wouldn't necessarily have AGCT on the stack. Consider: - Compiler gets invoked, write protects, compiles, then tries to restore write protection. Gets interrupted by AGCT after calling pthread_jit_write_protect_np but before updating Thread::_wx_state. - Now AGCT runs. It disables write protection, does its thing, then reinstates the state *it thinks preceded it*. But that is the wrong state. In this case, the one used during compilation. - we return from signal handling. Compiler now sets Thread::_wx_state and assumes write protection is restored, but it isnt. - Later that day, the thread tries to call into compiled code. It will not be able to execute it, since the protection is wrong. There are variants of this play, but my point is the resulting crashes may happen after AGCT was invoked. So all we spy with our little eyes would be a segfault, I guess SEGV_ACCERR ?, and maybe the AGCT shared lib among the list of loaded libraries. In particular, we do not know if AGCT did interrupt the crashing thread recently. Or do we? This would be valuable information. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1481044491 From jbechberger at openjdk.org Thu Mar 23 12:03:06 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 12:03:06 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 11:46:12 GMT, Thomas Stuefe wrote: > There are variants of this play, but my point is the resulting crashes may happen after AGCT was invoked. I see the problem. Thanks for adding some clarification. We can all agree that my intended fix is not really a fix. >So all we spy with our little eyes would be a segfault, I guess SEGV_ACCERR ?, and maybe the AGCT shared lib among the list of loaded libraries. Do you mean async-profiler? This would then hard-code the name of every relevant profiler in the OpenJDK. > In particular, we do not know if AGCT did interrupt the crashing thread recently. Or do we? This would be valuable information. Yes. But it could indeed be helpful information when debugging problems. Please don't forget that JFR has a very similar code, and it would be good to have the information on whether JFR sampled a thread recently too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1481067359 From dholmes at openjdk.org Thu Mar 23 12:35:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Mar 2023 12:35:57 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:20:18 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use raw_thread instead of Thread::current() AGCT uses signals - most of what it does is not safe. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1481116274 From dholmes at openjdk.org Thu Mar 23 12:39:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Mar 2023 12:39:44 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:20:18 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use raw_thread instead of Thread::current() So AGCT already breaks in this scenario because WX is not enabled when needed. The code to enable WX is not async-signal_safe, so the "fix" is only a partial fix - it will sometimes fail too. So we go from broken to slightly less broken. This is why AGCT is not supported. Anyway retracting my review while discussion continues. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13144#pullrequestreview-1354523473 From dholmes at openjdk.org Thu Mar 23 12:48:11 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Mar 2023 12:48:11 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 08:20:18 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use raw_thread instead of Thread::current() Correction: I missed some of the discussion above. I see now this AGCT "fix" can break other code, so this makes things much more broken. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1481129143 From stuefe at openjdk.org Thu Mar 23 12:48:12 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Mar 2023 12:48:12 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 12:00:33 GMT, Johannes Bechberger wrote: > > In particular, we do not know if AGCT did interrupt the crashing thread recently. Or do we? This would be valuable information. > > Yes. But it could indeed be helpful information when debugging problems. Please don't forget that JFR has a very similar code, and it would be good to have the information on whether JFR sampled a thread recently too. Yes, but while JFR interrupts threads too, its sampler runs in its own thread, so the async-safety of the interrupted code should not matter, or? wrt info, just a little marker in the thread "AGCT was here", maybe with a timestamp, would be useful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1481130844 From jbechberger at openjdk.org Thu Mar 23 12:48:14 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 23 Mar 2023 12:48:14 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 12:43:38 GMT, Thomas Stuefe wrote: > Yes, but while JFR interrupts threads too, its sampler runs in its own thread, so the async-safety of the interrupted code should not matter, or? You're right. > wrt info, just a little marker in the thread "AGCT was here", maybe with a timestamp, would be useful. That sounds like a great idea. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1481133216 From jwilhelm at openjdk.org Thu Mar 23 13:29:04 2023 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Thu, 23 Mar 2023 13:29:04 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 UseNewLocks... Surely there must be a better name? For how long will these be the new locks? Do we rename the flag to UseOldLocks when the next locking scheme comes along? There must be some property that differentiates these locks from the older locks other than being new. Why not name the flag after that property? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1481191815 From matsaave at openjdk.org Thu Mar 23 15:09:48 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 23 Mar 2023 15:09:48 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v11] In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 17:42:35 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved comment for load-acquire aarch64 src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 2294: > 2292: > 2293: __ load_resolved_indy_entry(cache, index); > 2294: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); @reinrich this load needs acquire semantics. src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 2308: > 2306: // Update registers with resolved info > 2307: __ load_resolved_indy_entry(cache, index); > 2308: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); @reinrich same for this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1146343430 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1146343487 From mdoerr at openjdk.org Thu Mar 23 15:16:33 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 23 Mar 2023 15:16:33 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v11] In-Reply-To: References: Message-ID: <9dzvGNE-OGaFIhBkLExkMBt8eHekU42iRM7DXKmNnyo=.2ad0584d-3366-4feb-a8ed-da5270712d5c@github.com> On Thu, 23 Mar 2023 15:06:21 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Improved comment for load-acquire aarch64 > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 2294: > >> 2292: >> 2293: __ load_resolved_indy_entry(cache, index); >> 2294: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); > > @reinrich this load needs acquire semantics. We already have ordering by `isync(); // Order load wrt. succeeding loads.` below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1146353107 From matsaave at openjdk.org Thu Mar 23 15:37:27 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 23 Mar 2023 15:37:27 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v12] In-Reply-To: References: Message-ID: <2STtfSR29CesdgF3zkeMQBqkHjZk32Yqss0NURZDHEI=.f3d8c313-a0f1-4282-b215-59f5a948b2e8@github.com> > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Andrew comments aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/a4714f54..546291fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=10-11 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From rkennke at openjdk.org Thu Mar 23 16:00:17 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 23 Mar 2023 16:00:17 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 23 Mar 2023 13:25:52 GMT, Jesper Wilhelmsson wrote: > UseNewLocks... Surely there must be a better name? For how long will these be the new locks? Do we rename the flag to UseOldLocks when the next locking scheme comes along? There must be some property that differentiates these locks from the older locks other than being new. Why not name the flag after that property? The main difference is that this new approach doesn't overload the object header with a stack-pointer in a racy way. UseNonRacyLocking? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1481450601 From dcubed at openjdk.org Thu Mar 23 16:10:04 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 23 Mar 2023 16:10:04 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 I didn't see a reply to this comment from last week: https://github.com/openjdk/jdk/pull/10907#issuecomment-1472649263 > Another way to look at the option name question is to invert the sense of > the option. The old stack-locking code would be enabled by this new > UseStackLocking option (which would be on by default for now) and the > newer locking code that uses a lock-stack that is embedded in the JavaThread > would be the "else" case of the temporary UseStackLocking option. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1481462127 From rkennke at openjdk.org Thu Mar 23 16:10:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 23 Mar 2023 16:10:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 > I didn't see a reply to this comment from last week: [#10907 (comment)](https://github.com/openjdk/jdk/pull/10907#issuecomment-1472649263) > > > Another way to look at the option name question is to invert the sense of > > the option. The old stack-locking code would be enabled by this new > > UseStackLocking option (which would be on by default for now) and the > > newer locking code that uses a lock-stack that is embedded in the JavaThread > > would be the "else" case of the temporary UseStackLocking option. Right, that would be a possibility. It seems somewhat unnatural to put the old code under the true branch of an experimental option, but unless we come up with a better name, I'd be willing to accept this. It's all temporary anyway (hopefully) and not meant to be used by end-users, really. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1481467564 From dcubed at openjdk.org Thu Mar 23 16:17:22 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 23 Mar 2023 16:17:22 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 Of course, we'll still have the problem that in the comments we talk about "stack-locking" for the old scheme of using the BasicLock on the JavaThread's stack and, with this project's comments, we talk about "fast-locking" for the new scheme of using a stack-lock embedded in the JavaThread. We'll still need a better name than "fast-locking" in the comments... ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1481482032 From rkennke at openjdk.org Thu Mar 23 16:48:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 23 Mar 2023 16:48:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 Is anybody familiar with the academic literature on this topic? I am sure I am not the first person which has come up with this form of locking. Maybe we could use a name that refers to some academic paper? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1481512899 From cjplummer at openjdk.org Thu Mar 23 18:21:28 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 23 Mar 2023 18:21:28 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 23 Mar 2023 05:54:29 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > address review comment: remove unneeded function src/hotspot/share/prims/jvmtiThreadState.hpp line 101: > 99: static void set_VTMS_notify_jvmti_events(bool val) { _VTMS_notify_jvmti_events = val; } > 100: > 101: static void set_VTMS_transition_count(bool val) { _VTMS_transition_count = val; } Why set the count if it is never going to be used? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 38: > 36: */ > 37: > 38: //import compiler.whitebox.CompilerWhiteBoxTest; Remove test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 68: > 66: if (n <= 0) { > 67: n = 1000; > 68: ToggleNotifyJvmtiTest.sleep(1); It looks like you do this short sleep 1 out of every 1,000,000 iterations. Can you explain why? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 72: > 70: if (i > n) { > 71: i = 0; > 72: n = n - 1; n-- test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 74: > 72: n = n - 1; > 73: } > 74: i = i + 1; i++ test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 148: > 146: > 147: static private void setVirtualThreadsNotifyJvmtiMode(int iter, boolean enable) { > 148: sleep(5); Why is this needed? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 157: > 155: > 156: if (args.length > 0 && args[0].equals("attach")) { // agent loaded into running VM case > 157: String arg = args.length == 2 ? args[1] : ""; I don't see any args being passed in other than "attach"? What might `arg` be set to? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 161: > 159: vm.loadAgentLibrary(AGENT_LIB, arg); > 160: } else { > 161: System.loadLibrary(AGENT_LIB); Why is this needed? Isn't the library already loaded due to it being specified by `-agentlib`? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 171: > 169: sleep(20); > 170: > 171: for (int iter = 0; VirtualThreadStartedCount() < VTHREADS_CNT; iter++) { The test seems to exit once all the threads are started. I would think you would want it to run for a while after all the threads are started. test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 32: > 30: > 31: static jvmtiEnv *jvmti; > 32: static int vthread_started_cnt = 0; Needs to be volatile? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 34: > 32: static int vthread_started_cnt = 0; > 33: static jrawMonitorID agent_lock = NULL; > 34: static bool can_support_vt_enabled = false; This variable doesn't seem to be needed. You set it `true` later on, but never reference it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146578471 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146594032 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146608558 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146598256 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146598429 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146625993 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146620881 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146624531 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146629460 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146630461 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146633746 From lmesnik at openjdk.org Thu Mar 23 21:58:08 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 23 Mar 2023 21:58:08 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) Message-ID: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. ------------- Commit messages: - parsing updted - problemlist fixed - comments - fix Changes: https://git.openjdk.org/jdk/pull/13170/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13170&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304834 Stats: 34 lines in 2 files changed: 20 ins; 10 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13170/head:pull/13170 PR: https://git.openjdk.org/jdk/pull/13170 From lmesnik at openjdk.org Thu Mar 23 22:29:51 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 23 Mar 2023 22:29:51 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 23 Mar 2023 05:54:29 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > address review comment: remove unneeded function There are few comments in the code. Also, I think it would be nice to have a high-level comment about sync of VTMTDisable and lately attached agent. What is set and why VMOp is needed. BTW why VMOp and not handshake is used? src/hotspot/share/prims/jvmtiEnvBase.cpp line 1581: > 1579: return false; > 1580: } > 1581: if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { shouldn't be if (!JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { here? src/hotspot/share/prims/whitebox.cpp line 2537: > 2535: } > 2536: #endif > 2537: return result; The test never check results, so it might be better to generate fatal error and fail here/throw exception to catch bug earlier? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 82: > 80: try { > 81: while (!threadReady) { > 82: sleep(1); Use ToggleNotifyJvmtiTest.sleep(1);? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 77: > 75: > 76: err = jvmti->AddCapabilities(&caps); > 77: if (err != JVMTI_ERROR_NONE) { Can you use check_jvmti_status to ensure that result is not error? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 82: > 80: } > 81: err = jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_VIRTUAL_THREAD_START, NULL); > 82: if (err != JVMTI_ERROR_NONE) { 2nd Can you use check_jvmti_status to ensure that result is not error? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 89: > 87: LOG("Agent init: can_support_virtual_threads capability: %d\n", caps.can_support_virtual_threads); > 88: > 89: err = jvmti->SetEventCallbacks(&callbacks, (jint)sizeof(callbacks)); 3rd Can you use check_jvmti_status to ensure that result is not error? ------------- Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13133#pullrequestreview-1355740399 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146929861 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146934499 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146936003 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146938334 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146938504 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146938603 From sspitsyn at openjdk.org Fri Mar 24 00:19:32 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 00:19:32 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 23 Mar 2023 17:49:31 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > src/hotspot/share/prims/jvmtiThreadState.hpp line 101: > >> 99: static void set_VTMS_notify_jvmti_events(bool val) { _VTMS_notify_jvmti_events = val; } >> 100: >> 101: static void set_VTMS_transition_count(bool val) { _VTMS_transition_count = val; } > > Why set the count if it is never going to be used? The counter `_VTMS_transition_count` is directly used by the `jvmtiVTMSTransitionDisabler`. > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 38: > >> 36: */ >> 37: >> 38: //import compiler.whitebox.CompilerWhiteBoxTest; > > Remove Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146994030 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146994886 From sspitsyn at openjdk.org Fri Mar 24 00:26:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 00:26:33 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: <0J5IbRtwgK7aOiXWfWBSVLo0RniWf6rRfvqmqA59j5A=.a7cf4ca2-a750-4281-9c78-285ca58e21da@github.com> On Thu, 23 Mar 2023 18:02:29 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 68: > >> 66: if (n <= 0) { >> 67: n = 1000; >> 68: ToggleNotifyJvmtiTest.sleep(1); > > It looks like you do this short sleep 1 out of every 1,000,000 iterations. Can you explain why? It is for yielding. Do you think we need this with a bigger frequency? > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 72: > >> 70: if (i > n) { >> 71: i = 0; >> 72: n = n - 1; > > n-- This code was copied originally from the vmTestbase to SuspendThread* tests and some other tests. I can do all suggested simplifications but not sure if it is really necessary. It does not matter what exactly the method does because it just simulates some thread activity. Would it better to keep copy/pasted methods the same? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146998399 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1146997072 From sspitsyn at openjdk.org Fri Mar 24 00:43:34 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 00:43:34 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 23 Mar 2023 18:07:23 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 161: > >> 159: vm.loadAgentLibrary(AGENT_LIB, arg); >> 160: } else { >> 161: System.loadLibrary(AGENT_LIB); > > Why is this needed? Isn't the library already loaded due to it being specified by `-agentlib`? Good question. We almost always do it in the JVMTI tests including `serviceability/jvmti/vthread` and `vmTestbase/nsk/jvmti` tests. Examples are 22 `serviceability/jvmti/vthread` tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147009306 From sspitsyn at openjdk.org Fri Mar 24 01:00:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 01:00:33 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: <5KI1mSSGb__Xsp4mKAlOmEYu2Sq7QpeNWat3G1_cYjM=.42488df5-28dd-4e84-9c89-83a045642326@github.com> On Thu, 23 Mar 2023 18:08:47 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 148: > >> 146: >> 147: static private void setVirtualThreadsNotifyJvmtiMode(int iter, boolean enable) { >> 148: sleep(5); > > Why is this needed? It is needed to balance enabling/disabling notifyJvmti mode with the ThreadStart/VirtualThreadStart events. Otherwise, many mode switches can be observed without any events which is not interesting. We need to allow virtual threads to execute a little bit after a mode switch. > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 171: > >> 169: sleep(20); >> 170: >> 171: for (int iter = 0; VirtualThreadStartedCount() < VTHREADS_CNT; iter++) { > > The test seems to exit once all the threads are started. I would think you would want it to run for a while after all the threads are started. I'm not sure if it is really needed. 60 virtual threads are started. Some of them are executed long enough before shutdown. We can just increase the number of threads if necessary. > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 32: > >> 30: >> 31: static jvmtiEnv *jvmti; >> 32: static int vthread_started_cnt = 0; > > Needs to be volatile? Thanks. We use a RawMonitor for sync. But anyway I made it volatile now. > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 34: > >> 32: static int vthread_started_cnt = 0; >> 33: static jrawMonitorID agent_lock = NULL; >> 34: static bool can_support_vt_enabled = false; > > This variable doesn't seem to be needed. You set it `true` later on, but never reference it. Good catch. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147011556 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147013785 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147014957 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147016373 From sspitsyn at openjdk.org Fri Mar 24 01:51:43 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 01:51:43 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: <5QPEmF5uNmAKzzpmEAeuis02ozKFfSaothwqLe2iSF4=.c4d56cf5-83a5-45d8-b616-25c0a965726b@github.com> On Thu, 23 Mar 2023 17:59:52 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 74: > >> 72: n = n - 1; >> 73: } >> 74: i = i + 1; > > i++ See above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147039026 From sspitsyn at openjdk.org Fri Mar 24 01:55:37 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 01:55:37 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: <_2AVe6wwMuSFdUmNRIf1JgXmJ6HFNJzBpkHT2HUqGiE=.51782910-04c7-46ad-b7be-331aa411f52b@github.com> On Thu, 23 Mar 2023 22:27:07 GMT, Leonid Mesnik wrote: > There are few comments in the code. > Also, I think it would be nice to have a high-level comment about sync of VTMTDisable and lately attached agent. > What is set and why VMOp is needed. > BTW why VMOp and not handshake is used? Good suggestion, thanks! I kept thinking where to add these types of comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1482140356 From sspitsyn at openjdk.org Fri Mar 24 02:03:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 02:03:33 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 23 Mar 2023 22:11:58 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1581: > >> 1579: return false; >> 1580: } >> 1581: if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { > > shouldn't be > if (!JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { > here? This is nice catch, thanks! > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 82: > >> 80: try { >> 81: while (!threadReady) { >> 82: sleep(1); > > Use ToggleNotifyJvmtiTest.sleep(1);? Good idea, thanks. Simplified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147044376 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147043984 From dholmes at openjdk.org Fri Mar 24 02:51:51 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 02:51:51 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 23 Mar 2023 16:32:57 GMT, Roman Kennke wrote: > Is anybody familiar with the academic literature on this topic? I am sure I am not the first person which has come up with this form of locking. Maybe we could use a name that refers to some academic paper? Well not to diminish this in any way but all you are doing is moving the lock-record from the stack frame (indexed from the markword) to a heap allocated side-table (indexed via the thread itself). The "fast-locking" is still the bit that use the markword to indicate the locked state, and that hasn't changed. Encoding lock state in an object header has a number of names in the literature, depending on whose scheme it was: IBM had ThinLocks; the Sun Research VM (EVM) had meta-locks; etc. Hotspot doesn't really have a name for its variation. And as I said you aren't changing that aspect but modifying what data structure is used to access the lock-records. So the property Jesper was looking for, IMO, may be something like `UseHeapLockRecords` - though that can unfortunately be parsed as using records for the HeapLock. :( I think it was mentioned somewhere above that in the Java Object Monitor prototyping work we avoided using these kinds of boolean flags by defining a single "policy" flag that could take on different values for different implementation schemes. These are simply numbered, so for example: - policy 0: use existing/legacy locking with stack-based lock records - policy 1: use heavyweight locks (ie UseHeavyMonitors) - policy 2 use the new approach with heap-allocated lock-records ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1482176363 From sspitsyn at openjdk.org Fri Mar 24 03:01:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 03:01:41 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 23 Mar 2023 22:24:45 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 77: > >> 75: >> 76: err = jvmti->AddCapabilities(&caps); >> 77: if (err != JVMTI_ERROR_NONE) { > > Can you use check_jvmti_status to ensure that result is not error? No, the `check_jvmti_status` needs a `JNIEnv*` which is not available in the context of `Agent_OnLoad`. > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 82: > >> 80: } >> 81: err = jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_VIRTUAL_THREAD_START, NULL); >> 82: if (err != JVMTI_ERROR_NONE) { > > 2nd Can you use check_jvmti_status to ensure that result is not error? Please, see above. > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 89: > >> 87: LOG("Agent init: can_support_virtual_threads capability: %d\n", caps.can_support_virtual_threads); >> 88: >> 89: err = jvmti->SetEventCallbacks(&callbacks, (jint)sizeof(callbacks)); > > 3rd Can you use check_jvmti_status to ensure that result is not error? Please, see above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147069349 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147069479 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147069520 From sspitsyn at openjdk.org Fri Mar 24 03:14:32 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 24 Mar 2023 03:14:32 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 23 Mar 2023 22:18:51 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > src/hotspot/share/prims/whitebox.cpp line 2537: > >> 2535: } >> 2536: #endif >> 2537: return result; > > The test never check results, so it might be better to generate fatal error and fail here/throw exception to catch bug earlier? Thank you for the comment. I think, the decision to throw exception has to be in the test. Modified the test as below: <<< WB.setVirtualThreadsNotifyJvmtiMode(enable); --- >>> boolean status = WB.setVirtualThreadsNotifyJvmtiMode(enable); >>> if (!status) { >>> throw new RuntimeException("Java: failed to set VirtualThreadsNotifyJvmtiMode: " + enable); >>> } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147075146 From dholmes at openjdk.org Fri Mar 24 04:42:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 04:42:29 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) In-Reply-To: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Thu, 23 Mar 2023 21:49:55 GMT, Leonid Mesnik wrote: > The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. Sorry I'm struggling a bit to see where the current parsing logic is failing. Can you given an example of a command-line that will get processed incorrectly? I'm not sure why the generic "if it starts with `-` then add it to vmargs" is needed as any option that starts with `-` should either be one known by the framework and explicitly handled in the parsing, or else be a -J-XXX flag to pass to the VM. So AFAICS the existing logic will just append anything it doesn't recognise to the app command line - and that seem the right thing to do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1482239229 From dholmes at openjdk.org Fri Mar 24 06:13:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 06:13:57 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 src/hotspot/share/interpreter/interpreterRuntime.cpp line 741: > 739: // This is a hack to get around the limitation of registers in x86_32. We really > 740: // send an oopDesc* instead of a BasicObjectLock*. > 741: Handle h_obj(current, oop((reinterpret_cast(elem)))); Wouldn't it be cleaner to retype elem as a void* and cast to the correct type. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147156418 From dholmes at openjdk.org Fri Mar 24 06:19:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 06:19:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: <49yCA-Vx9caLf1KSVYnST3QsQ_kJZhny4KKt6kQnapQ=.66c7d6dd-a076-40d1-9dad-ebecf6805674@github.com> On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 src/hotspot/share/oops/oop.cpp line 126: > 124: // Outside of a safepoint, the header could be changing (for example, > 125: // another thread could be inflating a lock on this object). > 126: if (ignore_mark_word || UseFastLocking) { Not clear why UseFastLocking appears here instead of in the return expression - especially given the comment above. src/hotspot/share/runtime/arguments.cpp line 1997: > 1995: } > 1996: if (UseHeavyMonitors) { > 1997: FLAG_SET_DEFAULT(UseFastLocking, false); Probably should be an error if both are set on the command-line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147159067 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147159460 From lmesnik at openjdk.org Fri Mar 24 06:31:15 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 24 Mar 2023 06:31:15 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) In-Reply-To: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: <1-OWyOhgZaUx8BefLbyZyOJNZtZ9_LiKfnIsXMhT67I=.e826f98a-afea-45d2-b29b-33299bfbf498@github.com> On Thu, 23 Mar 2023 21:49:55 GMT, Leonid Mesnik wrote: > The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. I've added comment with example from one of tests and expected result of parsing. The problem was that wrapper inserted it's arguments before vm options because didn't expect them in the args. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1482315285 From lmesnik at openjdk.org Fri Mar 24 06:31:14 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 24 Mar 2023 06:31:14 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: > The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: added comments and trim arguments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13170/files - new: https://git.openjdk.org/jdk/pull/13170/files/e7c5e99a..1b74ae22 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13170&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13170&range=00-01 Stats: 21 lines in 1 file changed: 8 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/13170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13170/head:pull/13170 PR: https://git.openjdk.org/jdk/pull/13170 From dholmes at openjdk.org Fri Mar 24 06:42:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 06:42:58 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: <7E8dbbZfswhEgM9yghtrXzUklVrZSCX2N15lhm7nQ_Q=.3a4a029c-bc04-4d0b-a501-a67775c1591b@github.com> On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 src/hotspot/share/runtime/synchronizer.cpp line 500: > 498: // Try to swing into 'fast-locked' state without inflating. > 499: markWord locked_header = header.set_fast_locked(); > 500: markWord witness = obj()->cas_set_mark(locked_header, header); Nit: really dislike the name witness - it is just the old header value. src/hotspot/share/runtime/synchronizer.cpp line 516: > 514: // No room on the lock_stack so fall-through to inflate-enter. > 515: } else { > 516: markWord mark = obj->mark(); why is it `mark` here but `header` above? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147173480 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147174724 From dholmes at openjdk.org Fri Mar 24 06:54:52 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 06:54:52 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: <_zupTz0CFtTIcMh1qjWTXj2ro3DQIeT3iWDuKLbI2Yg=.494a5726-857a-4ac4-83d4-c8d0112b514d@github.com> On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 src/hotspot/share/runtime/synchronizer.cpp line 761: > 759: markWord mark = obj->mark(); > 760: if ((mark.is_fast_locked() && current->lock_stack().contains(obj())) || > 761: (mark.has_locker() && current->is_lock_owned((address)mark.locker()))) { I'd prefer to see this conditionalized on UseFastlocking so that it is more obvious where the fast-locking changes are. Ditto for other places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147182098 From dholmes at openjdk.org Fri Mar 24 06:58:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 06:58:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 src/hotspot/share/runtime/threads.cpp line 1422: > 1420: } > 1421: > 1422: JavaThread* Threads::owning_thread_from_object(ThreadsList * t_list, oop obj) { Is this thread-safe? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147184493 From dholmes at openjdk.org Fri Mar 24 07:03:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 07:03:55 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 > The lock-stack is grown when needed. Could you update the description of the PR with the latest approach please - others are unlikely to read all the comments to realize this has changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1482342876 From jbachorik at openjdk.org Fri Mar 24 07:31:32 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Fri, 24 Mar 2023 07:31:32 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Thu, 23 Mar 2023 12:45:27 GMT, Johannes Bechberger wrote: > Yes, but while JFR interrupts threads too, its sampler runs in its own thread, so the async-safety of the interrupted code should not matter, or? And this sampling approach is quite subpar when compared with the signal based profiling - both in terms of bias and precision. Meaning that if/when JFR would implement a state-of-the-art CPU profiler it will have to deal with the same issues we are seeing here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1482370219 From gcao at openjdk.org Fri Mar 24 07:48:51 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 24 Mar 2023 07:48:51 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v12] In-Reply-To: <2STtfSR29CesdgF3zkeMQBqkHjZk32Yqss0NURZDHEI=.f3d8c313-a0f1-4282-b215-59f5a948b2e8@github.com> References: <2STtfSR29CesdgF3zkeMQBqkHjZk32Yqss0NURZDHEI=.f3d8c313-a0f1-4282-b215-59f5a948b2e8@github.com> Message-ID: On Thu, 23 Mar 2023 15:37:27 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Andrew comments aarch64 Thanks, the riscv port passed the tier1-3 tests on unmatched board with no new errors introduced. ------------- Marked as reviewed by gcao (Author). PR Review: https://git.openjdk.org/jdk/pull/12778#pullrequestreview-1356148354 From stuefe at openjdk.org Fri Mar 24 07:51:00 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Mar 2023 07:51:00 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 07:00:35 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > >> The lock-stack is grown when needed. > > Could you update the description of the PR with the latest approach please - others are unlikely to read all the comments to realize this has changed. @dholmes-ora > Is this thread-safe? I don't think so, but would the stacklock variant owning_thread_from_monitor_owner not suffer from the same problem? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1482386684 From stuefe at openjdk.org Fri Mar 24 07:57:32 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Mar 2023 07:57:32 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 07:28:27 GMT, Jaroslav Bachorik wrote: > > Yes, but while JFR interrupts threads too, its sampler runs in its own thread, so the async-safety of the interrupted code should not matter, or? > > And this sampling approach is quite subpar when compared with the signal based profiling - both in terms of bias and precision. Meaning that if/when JFR would implement a state-of-the-art CPU profiler it will have to deal with the same issues we are seeing here. Not if they run the walker outside the sampled thread. Which would be much safer in a lot of ways. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1482392946 From jbechberger at openjdk.org Fri Mar 24 09:49:32 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Fri, 24 Mar 2023 09:49:32 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 07:54:56 GMT, Thomas Stuefe wrote: > Not if they run the walker outside the sampled thread. Which would be much safer in a lot of ways. Yes, it would, but it is not currently possible with ASGCT, albeit changing it would be quite simple (not changing existing use cases or the API, but would require more tests). I would rather add this to the new JEP. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1482523335 From jbechberger at openjdk.org Fri Mar 24 10:24:36 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Fri, 24 Mar 2023 10:24:36 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3] In-Reply-To: References: Message-ID: <-sKa3prVSdWYycIfDQo9I1cikLcAx03Du9ScH3g8k-M=.f6a68864-d5f5-4522-be7b-b62f2d666668@github.com> On Thu, 23 Mar 2023 08:20:18 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use raw_thread instead of Thread::current() I tested the execution time of individual ASGCT calls on the renaissance benchmark using [asgct_perf_test](https://github.com/parttimenerd/asgct_perf_test) which I created for this purpose. TLDR: No relevant performance difference when disabling cache modification for ASGCT via a field in `Thread`. On Linux (64 core machine), all timings are in microseconds. With the disabled caches we get: bucket % count min mean max std std/mean median 90th 99th overall 100 81265409 0.04 4.07 59334.88 32.23 7.93 2.62 7.14 23.20 vs with current head OpenJDK: overall 100 81607301 0.03 3.96 83194.84 30.49 7.69 2.57 7.00 22.27 The difference of the averages just 110ns which should be undetectable in real life. With the disabled caches on Mac M1 we get: overall 100 39281484 0.00 1.39 94833.66 36.48 26.21 0.92 2.08 9.46 vs with the current JDK and [the fix adapted from async-profiler](https://github.com/async-profiler/async-profiler/blob/e3b7bfca227ae5c916f00abfacf0e61291df3a67/src/profiler.cpp#L383) which sets the WX mode: overall 100 37725188 0.00 1.40110634.45 29.39 20.92 0.92 2.17 10.21 Which doesn't alter the performance characteristics. I'm therefore pushing my new change to this PR, which prevents ASGCT from modifying any PcDesc cache. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1482570550 From jbechberger at openjdk.org Fri Mar 24 10:35:36 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Fri, 24 Mar 2023 10:35:36 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v4] In-Reply-To: References: Message-ID: > Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. > > Tested on my M1 mac. Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Remove misc lines - Disable caching in ASGCT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13144/files - new: https://git.openjdk.org/jdk/pull/13144/files/22b661dd..1973e005 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=02-03 Stats: 33 lines in 3 files changed: 27 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13144/head:pull/13144 PR: https://git.openjdk.org/jdk/pull/13144 From aph at openjdk.org Fri Mar 24 10:40:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 24 Mar 2023 10:40:45 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v12] In-Reply-To: <2STtfSR29CesdgF3zkeMQBqkHjZk32Yqss0NURZDHEI=.f3d8c313-a0f1-4282-b215-59f5a948b2e8@github.com> References: <2STtfSR29CesdgF3zkeMQBqkHjZk32Yqss0NURZDHEI=.f3d8c313-a0f1-4282-b215-59f5a948b2e8@github.com> Message-ID: On Thu, 23 Mar 2023 15:37:27 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Andrew comments aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2348: > 2346: address entry = CAST_FROM_FN_PTR(address, InterpreterRuntime::resolve_from_cache); > 2347: __ mov(method, code); // this is essentially Bytecodes::_invokedynamic > 2348: __ call_VM(noreg, entry, method); // Example uses temp = rbx. In this case rbx is method What is `rbx` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1147401509 From aph at openjdk.org Fri Mar 24 10:48:49 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 24 Mar 2023 10:48:49 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v12] In-Reply-To: <2STtfSR29CesdgF3zkeMQBqkHjZk32Yqss0NURZDHEI=.f3d8c313-a0f1-4282-b215-59f5a948b2e8@github.com> References: <2STtfSR29CesdgF3zkeMQBqkHjZk32Yqss0NURZDHEI=.f3d8c313-a0f1-4282-b215-59f5a948b2e8@github.com> Message-ID: On Thu, 23 Mar 2023 15:37:27 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Andrew comments aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2322: > 2320: } > 2321: } > 2322: We need a comment here: incoming args, result (what?) in LR. Also say what regs are clobbered. I guess because there's a `call_vm()` that's all call-clobbered regs. But saves `bcp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1147407483 From shade at openjdk.org Fri Mar 24 11:54:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 24 Mar 2023 11:54:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 Cursory review follows. src/hotspot/cpu/aarch64/aarch64.ad line 3848: > 3846: __ bind(slow); > 3847: __ tst(oop, oop); // Force ZF=0 to indicate failure and take slow-path. We know that oop != null. > 3848: __ b(no_count); Is this a micro-optimization? I think we can simplify the code by just setting the flags here and then jumping into the usual `__ b(cont)`. This would make the move of `__ b(cont)` unnecessary below. src/hotspot/cpu/aarch64/aarch64.ad line 3954: > 3952: // Indicate success on completion. > 3953: __ cmp(oop, oop); > 3954: __ b(count); `aarch64_enc_fast_lock` explicitly sets NE on failure path. But this code just jumps to `no_count` without setting the flag. Does the code outside this encoding block rely on flags? src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp line 90: > 88: Label done; > 89: // and mark it as unlocked > 90: orr(hdr, hdr, markWord::unlocked_value); Indentation: Suggestion: orr(hdr, hdr, markWord::unlocked_value); src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 322: > 320: // object is already locked (xhandlers expect object to be unlocked) > 321: CodeEmitInfo* info = state_for(x, x->state(), true); > 322: LIR_Opr tmp = UseFastLocking ? new_register(T_INT) : LIR_OprFact::illegalOpr; Is it really `T_INT`, not `T_ADDRESS`? I guess it follows the definition of `LIR_Opr lock` above, but maybe that one is accidentally working because lock addresses fit in 32 bit? I wonder what that `tmp` would be used for, maybe 64-bit math would break with it? (I separately notice that `syncTempOpr()` is just `rax`, so ) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 611: > 609: bind(slow); > 610: testptr(objReg, objReg); // force ZF=0 to indicate failure > 611: jmp(NO_COUNT); We set a flag on failure (`NO_COUNT`) path. Should we set the flag on success (`COUNT`) path as well? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 926: > 924: mov(boxReg, tmpReg); > 925: fast_unlock_impl(objReg, boxReg, tmpReg, NO_COUNT); > 926: jmp(COUNT); Do we need to care about returning proper flags here? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9700: > 9698: // If successful, push object to lock-stack. > 9699: movl(tmp, Address(thread, JavaThread::lock_stack_offset_offset())); > 9700: movptr(Address(thread, tmp, Address::times_1), obj); Minor: I think `Address::times_1` is unnecessary in cases like this. (You might want to grep for other instances of this). src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1714: > 1712: // Save the test result, for recursive case, the result is zero > 1713: __ movptr(Address(lock_reg, mark_word_offset), swap_reg); > 1714: __ jcc(Assembler::notEqual, slow_path_lock); Indenting: Suggestion: __ jcc(Assembler::notEqual, slow_path_lock); src/hotspot/share/runtime/lockStack.cpp line 42: > 40: > 41: #ifndef PRODUCT > 42: void LockStack::validate(const char* msg) const { Would you also like to check there are no `nullptr` elements on stack here? src/hotspot/share/runtime/lockStack.hpp line 42: > 40: // We do this instead of a simple index into the array because this allows for > 41: // efficient addressing in generated code. > 42: int _offset; This field is accessed from compiled code, we better use a strict-width datatype like `uint32_t`, or maybe even `uint16_t` here? src/hotspot/share/runtime/lockStack.inline.hpp line 54: > 52: assert(to_index(_offset) > 0, "underflow, probably unbalanced push/pop"); > 53: _offset -= oopSize; > 54: oop o = _base[to_index(_offset)]; I think `pop` and `remove` might benefit from zapping the removed elements in the stack, just to make sure we don't accidentally read them? `validate` can check that everything beoynd `end` is zapped, and there are no zapped elements on stack? src/hotspot/share/runtime/synchronizer.cpp line 315: > 313: const markWord mark = obj->mark(); > 314: > 315: if ((mark.is_fast_locked() && current->lock_stack().contains(oop(obj))) || `cast_to_oop(obj)` instead of `oop(obj)`? src/hotspot/share/runtime/synchronizer.cpp line 923: > 921: static bool is_lock_owned(Thread* thread, oop obj) { > 922: assert(UseFastLocking, "only call this with fast-locking enabled"); > 923: return thread->is_Java_thread() ? reinterpret_cast(thread)->lock_stack().contains(obj) : false; Here and later, should use `JavaThread::cast(thread)` instead of `reinterpret_cast`? It also sometimes subsumes the asserts, as `JT::cast` checks the type. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ObjectMonitor.java line 83: > 81: > 82: public boolean isOwnedAnonymous() { > 83: return addr.getAddressAt(ownerFieldOffset).asLongValue() == 1; This `1` should be `ANONYMOUS_OWNER` constant, I think. So that textual search would hit both `oop.hpp` and this file. ------------- PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1356303825 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147415098 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147411462 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147416146 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147433422 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147441662 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147444139 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147452034 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147452841 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147463354 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147319821 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147469524 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147478054 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147374433 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147479959 From dholmes at openjdk.org Fri Mar 24 12:12:03 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 12:12:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 10:10:58 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/share/runtime/synchronizer.cpp line 923: > >> 921: static bool is_lock_owned(Thread* thread, oop obj) { >> 922: assert(UseFastLocking, "only call this with fast-locking enabled"); >> 923: return thread->is_Java_thread() ? reinterpret_cast(thread)->lock_stack().contains(obj) : false; > > Here and later, should use `JavaThread::cast(thread)` instead of `reinterpret_cast`? It also sometimes subsumes the asserts, as `JT::cast` checks the type. Only JavaThreads can own monitors so this function should take a JavaThread in the first place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147499577 From dholmes at openjdk.org Fri Mar 24 12:18:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Mar 2023 12:18:06 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 07:00:35 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > >> The lock-stack is grown when needed. > > Could you update the description of the PR with the latest approach please - others are unlikely to read all the comments to realize this has changed. > @dholmes-ora > > > Is this thread-safe? > > I don't think so, but would the stacklock variant owning_thread_from_monitor_owner not suffer from the same problem? @tstuefe Yes but that code has already had its thread-safety properties determined (presumably) long ago. Checking whether an address is within a thread's stack is pretty thread-safe. The new code needs to ensure that iteration with `contains` is safe in the face of a concurrent push/pop/remove. Or it may be these functions are only called at a safepoint? If so there should be an assert in that case, so I presume that is not the case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1482707476 From matsaave at openjdk.org Fri Mar 24 15:13:40 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 24 Mar 2023 15:13:40 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v13] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Improved interpreter comments aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/546291fc..ff7f3503 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=11-12 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From stuefe at openjdk.org Fri Mar 24 17:00:47 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Mar 2023 17:00:47 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 11:36:24 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/share/runtime/lockStack.cpp line 42: > >> 40: >> 41: #ifndef PRODUCT >> 42: void LockStack::validate(const char* msg) const { > > Would you also like to check there are no `nullptr` elements on stack here? Please also verify against over- and underflow, and better than just null checks check that every oop really is an oop. I added this to my code: assert((_offset <= end_offset()), "lockstack overflow: _offset %d end_offset %d", _offset, end_offset()); assert((_offset >= start_offset()), "lockstack underflow: _offset %d end_offset %d", _offset, start_offset()); int end = to_index(_offset); for (int i = 0; i < end; i++) { assert(oopDesc::is_oop(_base[i]), "index %i: not an oop (" PTR_FORMAT ")", i, p2i(_base[i])); ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1147841666 From stuefe at openjdk.org Fri Mar 24 17:15:07 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Mar 2023 17:15:07 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 I'm currently hitting a roadblock on the arm port, where > > @dholmes-ora > > > Is this thread-safe? > > > > > > I don't think so, but would the stacklock variant owning_thread_from_monitor_owner not suffer from the same problem? > > @tstuefe Yes but that code has already had its thread-safety properties determined (presumably) long ago. Checking whether an address is within a thread's stack is pretty thread-safe. The new code needs to ensure that iteration with `contains` is safe in the face of a concurrent push/pop/remove. > > Or it may be these functions are only called at a safepoint? If so there should be an assert in that case, so I presume that is not the case. This is a good question. Its not bound to Safepoint, but it is used in two places atm: - JVMTI GetObjectMonitorUsage - Thread dump facility, monitor display AFAIU, these places now could get outdated information from the contains function: either getting a false positive or a false negative when asking for a given object. But if I understand correctly, this had been the case before, too. Since the information about ownership may be outdated as soon as one read the BasicLock* address from the markwork. So these APIs do only report a state that once had been, but never the current state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1483144364 From adinn at redhat.com Fri Mar 24 17:21:15 2023 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 24 Mar 2023 17:21:15 +0000 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: Hi Ron, Thank you for providing a heads up on the proposed JEP. The Red Hat Java team have been discussing this proposal. We have reviewed the original discussion and also the surrounding debate which established requirements for adaptation of Jigsaw to incorporate the needs of agents. As an aside, I'll note that a thorough review was necessary /even/ in my case, despite the fact that I was an active party, because the discussion occurred, and corresponding decisions were made, quite some time ago. I mention this because it may explain the air of surprise and the desire to reiterate some of the original debate on the part of some respondents in this thread, who perhaps were not party, or only tangentially party, to the discussion. That also suggests that there may be a lot users who are not aware that the -XX:+EnableDynamicAgentLoading switch exists or do not really understand why it exists i.e. that there is a broad education issue at play here. We do have some concerns about the JEP, specifically about the timing of its delivery. These are probably best addressed via the normal review process. In particular that will ensure the discussion happens in a more suitable and more widely subscribed forum than the Jigsaw list. However, I will briefly mention our concerns in this reply. Before that let me start with a few disclaimers: - We acknowledge that there is little to be gained from re-iterating arguments made in the previous discussion (although that does not imply the JEP review would not benefit from new arguments, especially from those who were not involved in that discussion) - We recognize that the purpose of the -XX:+EnableDynamicAgentLoading switch is to offer a platform integrity guarantee and that this change of the default reflects a desire to prioritise integrity over the flexibility that agents provide - We recognize that the proposal is only proposing to flip a configuration default rather than detract from (or modify) available functionality - We recognize that changing this default will still allow (*most*) users to configure the behaviour they desire - We recognize that this advance notice has been given precisely to ensure that anyone wishing to deploy on jdk21 an app that relies on use of agents has time to plan appropriate configuration for their deployment - We recognize that this change of default is not being proposed for backport and hence that it will largely only affect the relatively small number of users who are currently developing for jdk21+ So, given that as a base for our comments where is the beef? - Our main concern is, predictably, timing. Clearly, this is a future, potential problem rather than a present problem - no one can be deploying on jdk21 yet and most developers who are currently preparing an app for deployment on jdk21+ will likely encounter the effect of this change before actual deployment and be in a position to remedy it. The concern is that advertising a change like this and getting users prepared to respond to it has always been difficult to achieve. In particular we expect a long tail of support problems from users who are trying to upgrade deployments from earlier releases to jdk21. So, while it is nice to have such early notice of the proposal we plan to review its likely impact on our users and how much time we need to prepare ourselves and our users to negotiate this change in behaviour. Any evidence we obtain to suggest a delay in targeting is appropriate will be brought to the JEP review. - A second, related concern is that flipping the default for this configuration in an LTS release as the first exposure to it for most people is more likely to derail deployment plans for users than if the default were flipped in a non-LTS release. If this change were deferred to jdk22 then that would give those planning deployment on (or upgrade to) jdk25 and also those planning to upgrade from jdk17 to jdk21 more time to discover and respond to the change. - A third concern, already pointed out by Volker, is that some users may run their Java apps via launcher apps or scripts that mask access to the Java command line. For such users the change of default may mean that they lose the option to deploy dynamic agents for important ancillary tasks such as observability. We are not clear how many of our users this affects but we will be looking into this and hope to bring feedback to the JEP review. Obviously, this problem can be remedied relatively easily by the supplier of the launcher enabling agent use or providing a suitable control switch. Our concern is not with how to solve this problem rather how the involvement of two parties, supplier and end user, might imply a need for the JEP to be targeted to a later release. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From coleenp at openjdk.org Fri Mar 24 18:57:41 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 24 Mar 2023 18:57:41 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v13] In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 15:13:40 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved interpreter comments aarch64 Still looks good to me. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12778#pullrequestreview-1357273317 From cjplummer at openjdk.org Fri Mar 24 19:20:38 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 24 Mar 2023 19:20:38 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: <0J5IbRtwgK7aOiXWfWBSVLo0RniWf6rRfvqmqA59j5A=.a7cf4ca2-a750-4281-9c78-285ca58e21da@github.com> References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> <0J5IbRtwgK7aOiXWfWBSVLo0RniWf6rRfvqmqA59j5A=.a7cf4ca2-a750-4281-9c78-285ca58e21da@github.com> Message-ID: On Fri, 24 Mar 2023 00:23:19 GMT, Serguei Spitsyn wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 68: >> >>> 66: if (n <= 0) { >>> 67: n = 1000; >>> 68: ToggleNotifyJvmtiTest.sleep(1); >> >> It looks like you do this short sleep 1 out of every 1,000,000 iterations. Can you explain why? > > It is for yielding. Do you think we need this with a bigger frequency? I guess the question then is why the need to yield. It just seems a bit odd that I thought the main point of this loop was to keep busy calling `breakpointCheck()`, and I don't see how doing a yield 1 out of every 1,000,000 iterations relates to that. >> test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 72: >> >>> 70: if (i > n) { >>> 71: i = 0; >>> 72: n = n - 1; >> >> n-- > > This code was copied originally from the vmTestbase to SuspendThread* tests and some other tests. > I can do all suggested simplifications but not sure if it is really necessary. > It does not matter what exactly the method does because it just simulates some thread activity. > Would it better to keep copy/pasted methods the same? ok >> test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 148: >> >>> 146: >>> 147: static private void setVirtualThreadsNotifyJvmtiMode(int iter, boolean enable) { >>> 148: sleep(5); >> >> Why is this needed? > > It is needed to balance enabling/disabling notifyJvmti mode with the ThreadStart/VirtualThreadStart events. > Otherwise, many mode switches can be observed without any events which is not interesting. > We need to allow virtual threads to execute a little bit after a mode switch. Shouldn't that be the caller's responsibility? Including a comment would be helpful. >> test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 161: >> >>> 159: vm.loadAgentLibrary(AGENT_LIB, arg); >>> 160: } else { >>> 161: System.loadLibrary(AGENT_LIB); >> >> Why is this needed? Isn't the library already loaded due to it being specified by `-agentlib`? > > Good question. We almost always do it in the JVMTI tests including `serviceability/jvmti/vthread` and `vmTestbase/nsk/jvmti` tests. Examples are 22 `serviceability/jvmti/vthread` tests. Are you saying it's not needed, but you included it to be consistent with other tests? >> test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 171: >> >>> 169: sleep(20); >>> 170: >>> 171: for (int iter = 0; VirtualThreadStartedCount() < VTHREADS_CNT; iter++) { >> >> The test seems to exit once all the threads are started. I would think you would want it to run for a while after all the threads are started. > > I'm not sure if it is really needed. 60 virtual threads are started. > Some of them are executed long enough before shutdown. > We can just increase the number of threads if necessary. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147966083 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147966677 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147969519 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147969967 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1147970566 From cjplummer at openjdk.org Fri Mar 24 21:39:34 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 24 Mar 2023 21:39:34 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Fri, 24 Mar 2023 06:31:14 GMT, Leonid Mesnik wrote: >> The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added comments and trim arguments Changes look good. I was hoping it would fix more of the failing tests than it did. I get we'll need to take a closer look at them. Would be good to eventually diagnose the root cause of the failures and get bugs filed for each category of failure. test/jdk/com/sun/jdi/TestScaffold.java line 520: > 518: argInfo.targetAppCommandLine = TestScaffold.class.getName() + ' ' > 519: + mainWrapper + ' ' + argInfo.targetAppCommandLine; > 520: argInfo.targetVMArgs += "--enable-preview "; It looks like previously we ignored `main.wrapper` if not set to `Virtual`, but with your chagnes we accept any setting. That's ok, but `--enable-preview` is really on needed when set to `Virtual`. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13170#pullrequestreview-1357465403 PR Review Comment: https://git.openjdk.org/jdk/pull/13170#discussion_r1148078319 From lmesnik at openjdk.org Fri Mar 24 21:47:30 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 24 Mar 2023 21:47:30 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Fri, 24 Mar 2023 06:31:14 GMT, Leonid Mesnik wrote: >> The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added comments and trim arguments The non Virtual modes are Kernel (should be Platform) and None. They are mostly used to find id test fail because of wrapper or virtual threads. Not used in testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1483450424 From aph at openjdk.org Fri Mar 24 22:31:40 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 24 Mar 2023 22:31:40 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v13] In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 15:13:40 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved interpreter comments aarch64 AArch64 looks good. Thank you. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12778#pullrequestreview-1357525728 From greggwon at cox.net Mon Mar 20 13:42:51 2023 From: greggwon at cox.net (Gregg Wonderly) Date: Mon, 20 Mar 2023 08:42:51 -0500 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <0737FC9F-629E-41E4-BD56-8955B6142FC8@oracle.com> Message-ID: <08242F6D-170B-43BD-B7B6-7E4DFB7B4662@cox.net> After all, we do know that Oracle in fact knows about every single Java application, where it runs, where it?s deployed and what the future plans are for the same. Otherwise, how else could they know what changes need to be made in the platform, right? Gregg Wonderly > On Mar 20, 2023, at 5:10 AM, Ron Pressler wrote: > > Hi. > > The majority of serviceability tools don?t require dynamically loading an agent, and the majority of applications never load an agent dynamically. > > True, there are some tools that will be affected, which is why the decision was to introduce the flag in JDK 9 and to announce this change, but change the default in a later version to give tools ample time to prepare their users. The rationale for this change then hasn?t changed, but will be reiterated in a JEP (we just wanted to announce this ahead of the JEP to give tool authors another reminder more than six months ahead of JDK 21). The only change between then and now is that even fewer use cases require dynamically loaded agents, and so the impact is even smaller. > > It is also true that, when starting an application you don?t know that you *will* need to load an agent, but in most situations you know that you might. E.g. processes that are too critical to bring down even for deep maintenance (although not many of these are written in modern version of Java anyone) or canary services that are under trial. The relatively few sophisticated users who know how to write ad-hoc agents can even opt to enable dynamic agent loading on all their servers; these users are better equipped to can weigh the risks and tradeoffs involved. > > Finally, some tools that require a dynamically loaded JVM TI agents, such as profilers that profile native code, are so tied to the VM's internals that the best place for them is in the JDK. If anything, the bigger problem is not that profilers are used too much in production, but too little, including less advanced ones that don?t require an agent. There is plenty of time to enhance the JDK?s built-in profiling capabilities ahead of demand. > > ? Ron > >> On 20 Mar 2023, at 01:21, Andrei Pangin > wrote: >> >> Hi all, >> >> Serviceability has been one of the biggest Java strengths, but the proposed change is going to have a large negative impact on it. >> >> Disallowing dynamic agents by default means it will no longer be possible to attach a profiler to a running app in runtime. JFR cannot close this gap due to lack of capabilities modern Java profilers have (that's a separate topic though). >> >> When an issue happens with a live app, it's already too late to add a command line argument. Furthermore, it may not be even feasible to add an agent at startup in containerized applications. Starting profiler on demand from the host OS or from a sidecar is the only viable solution in these cases. >> >> Next, it's hard to predict beforehand what tools exactly might be useful for troubleshooting: e.g., one tool may be better for finding memory leaks, a different one for analyzing CPU performance. Adding all possible tools at startup does not seem a reasonable approach, especially when tools may conflict with each other. >> >> The most important aspect of dynamic agents is the possibility to make a special tool just in time for solving a particular problem. A typical example is to get a value of some field in a live app without dumping the entire 60 GB heap. Another common use case is hot patching for fixing trivial bugs or for adding debug logs dynamically. The prominent example is when the dynamic agent has proved irreplaceable aid in addressing the notorious log4j vulnerabilities CVE-2021-44228 and CVE-2021-45046. >> >> I would be grateful to know more about the reasons why we should give up all the above advantages of dynamic agents in their good and legitimate use cases. >> >> Thank you, >> Andrei >> >> ??, 16 ???. 2023??. ? 18:48, Ron Pressler >: >>> Hi. >>> >>> In JDK 21 we intend to disallow the dynamic loading of agents by default. This >>> will affect tools that use the Attach API to load an agent into a JVM some time >>> after the JVM has started [1]. There is no change to any of the mechanisms that >>> load an agent at JVM startup (-javaagent/-agentlib on the command line or the >>> Launcher-Agent-Class attribute in the main JAR's manifest). >>> >>> This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. >>> At that time the consensus was to switch to this default not in JDK 9 but in a >>> later release to give tool maintainers sufficient time to inform their users. >>> To allow the dynamic loading of agents, users will need to specify >>> -XX:+EnableDynamicAgentLoading on the command line. >>> >>> I'll post a draft JEP for review shortly. >>> >>> -- Ron >>> >>> [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html >>> [2]: https://openjdk.org/jeps/261 >>> [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greggwon at cox.net Fri Mar 24 19:56:32 2023 From: greggwon at cox.net (Gregg Wonderly) Date: Fri, 24 Mar 2023 14:56:32 -0500 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: <33D9DA9C-5539-40FC-8AAF-21926C6CFE9B@cox.net> Lot?s of people use Java in places where there is no ?release? cycle of Java version in control of the users. These are ?corporate users? in most cases and they have Java applications that they are using which will just ?stop working? when a new version of Java is installed. Over the years, I?ve watch any favoritism towards java on the desktop or as a general solution programing language wane, because it?s undependable as a platform. You never no when something will break as these ?stability? changes occur. People who use software systems are in large part not programers or language/platform experts. The in ability of Oracle and many others to understand how detrimental this behavior has been is just mind blowing. People like myself end up looking like whining babies because we come back every once in a while to see if there is something useful happening in Java development that might finally stabilize the platform on the desktop and other business environments and low and behold write-once-run-anywhere is found to still be unimplemented and basically non appreciated. It?s just a sad, sad thing to see happening. Sun first did this with Java 1.2. The Community beat up on Sun severely and everything quieted down for a while. Then we had the JDK 1.5 release where my much mentioned volatile reachability optimizations broke software all over the place. This is not happening in any other language I am aware of. The people at Sun who were causing all the problems seemed to have gone on to Oracle and there is just a core group of people who just do not understand how horrible Java looks these days because of how much basic functionality got completely broken when a new version of Java showed up on general purpose computing and working software just stopped working? Gregg Wonderly > On Mar 24, 2023, at 12:21 PM, Andrew Dinn wrote: > > Hi Ron, > > Thank you for providing a heads up on the proposed JEP. The Red Hat Java team have been discussing this proposal. We have reviewed the original discussion and also the surrounding debate which established requirements for adaptation of Jigsaw to incorporate the needs of agents. > > As an aside, I'll note that a thorough review was necessary /even/ in my case, despite the fact that I was an active party, because the discussion occurred, and corresponding decisions were made, quite some time ago. I mention this because it may explain the air of surprise and the desire to reiterate some of the original debate on the part of some respondents in this thread, who perhaps were not party, or only tangentially party, to the discussion. > > That also suggests that there may be a lot users who are not aware that the -XX:+EnableDynamicAgentLoading switch exists or do not really understand why it exists i.e. that there is a broad education issue at play here. > > We do have some concerns about the JEP, specifically about the timing of its delivery. These are probably best addressed via the normal review process. In particular that will ensure the discussion happens in a more suitable and more widely subscribed forum than the Jigsaw list. However, I will briefly mention our concerns in this reply. Before that let me start with a few disclaimers: > > - We acknowledge that there is little to be gained from re-iterating arguments made in the previous discussion (although that does not imply the JEP review would not benefit from new arguments, especially from those who were not involved in that discussion) > > - We recognize that the purpose of the -XX:+EnableDynamicAgentLoading switch is to offer a platform integrity guarantee and that this change of the default reflects a desire to prioritise integrity over the flexibility that agents provide > > - We recognize that the proposal is only proposing to flip a configuration default rather than detract from (or modify) available functionality > > - We recognize that changing this default will still allow (*most*) users to configure the behaviour they desire > > - We recognize that this advance notice has been given precisely to ensure that anyone wishing to deploy on jdk21 an app that relies on use of agents has time to plan appropriate configuration for their deployment > > - We recognize that this change of default is not being proposed for backport and hence that it will largely only affect the relatively small number of users who are currently developing for jdk21+ > > So, given that as a base for our comments where is the beef? > > - Our main concern is, predictably, timing. Clearly, this is a future, potential problem rather than a present problem - no one can be deploying on jdk21 yet and most developers who are currently preparing an app for deployment on jdk21+ will likely encounter the effect of this change before actual deployment and be in a position to remedy it. The concern is that advertising a change like this and getting users prepared to respond to it has always been difficult to achieve. In particular we expect a long tail of support problems from users who are trying to upgrade deployments from earlier releases to jdk21. > So, while it is nice to have such early notice of the proposal we plan to review its likely impact on our users and how much time we need to prepare ourselves and our users to negotiate this change in behaviour. Any evidence we obtain to suggest a delay in targeting is appropriate will be brought to the JEP review. > > - A second, related concern is that flipping the default for this configuration in an LTS release as the first exposure to it for most people is more likely to derail deployment plans for users than if the default were flipped in a non-LTS release. If this change were deferred to jdk22 then that would give those planning deployment on (or upgrade to) jdk25 and also those planning to upgrade from jdk17 to jdk21 more time to discover and respond to the change. > > - A third concern, already pointed out by Volker, is that some users may run their Java apps via launcher apps or scripts that mask access to the Java command line. For such users the change of default may mean that they lose the option to deploy dynamic agents for important ancillary tasks such as observability. We are not clear how many of our users this affects but we will be looking into this and hope to bring feedback to the JEP review. > Obviously, this problem can be remedied relatively easily by the supplier of the launcher enabling agent use or providing a suitable control switch. Our concern is not with how to solve this problem rather how the involvement of two parties, supplier and end user, might imply a need for the JEP to be targeted to a later release. > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From fyang at openjdk.org Sat Mar 25 05:14:45 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 25 Mar 2023 05:14:45 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v13] In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 15:13:40 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved interpreter comments aarch64 Changes requested by fyang (Reviewer). src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2344: > 2342: // Compare the method to zero > 2343: __ tst(method, method); > 2344: __ br(Assembler::NE, resolved); Consider use 'cbnz' instruction here, like: __ cbnz(method, resolved) src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2360: > 2358: #ifdef ASSERT > 2359: __ tst(method, method); > 2360: __ br(Assembler::NE, resolved); Same here, like: __ cbnz(method, resolved); src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 452: > 450: } else { > 451: // Pop N words from the stack > 452: __ get_cache_and_index_at_bcp(x11, x12, 1, index_size); Better to use 'cache' and 'index' in this branch instead of 'x11' and 'x12'. src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2218: > 2216: } > 2217: > 2218: void TemplateTable::load_invokedynamic_entry(Register method) { Please also add a comment for this function, like aarch64. src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2236: > 2234: // Compare the method to zero > 2235: __ andr(t0, method, method); > 2236: __ bnez(t0, resolved); I think a more simpler "__ bnez(method, resolved)" will do. src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2243: > 2241: address entry = CAST_FROM_FN_PTR(address, InterpreterRuntime::resolve_from_cache); > 2242: __ mv(method, code); // this is essentially Bytecodes::_invokedynamic > 2243: __ call_VM(noreg, entry, method); // Example uses temp = rbx. In this case rbx is method Remove the code comment here as we don't have rbx for riscv. src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2252: > 2250: #ifdef ASSERT > 2251: __ andr(t0, method, method); > 2252: __ bnez(t0, resolved); Consider "__ bnez(method, resolved)". ------------- PR Review: https://git.openjdk.org/jdk/pull/12778#pullrequestreview-1357775328 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1148305440 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1148305543 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1148305885 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1148306138 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1148306256 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1148306282 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1148306390 From stuefe at openjdk.org Sat Mar 25 08:58:50 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 25 Mar 2023 08:58:50 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 I have another question about the asymmetric unlocking code in `InterpreterMacroAssembler::unlock_object`. We go through here for both fast-locked and fat OM locks, right? If so, shouldn't we do the asymmetric lock check only for the fast locked case? Otherwise Lockstack may be empty, so we compare the word preceding the first slot, which would cause us to always break into the slow case? Sorry if I miss something here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1483767964 From ecki at zusammenkunft.net Sat Mar 25 10:29:25 2023 From: ecki at zusammenkunft.net (Bernd) Date: Sat, 25 Mar 2023 11:29:25 +0100 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: <33D9DA9C-5539-40FC-8AAF-21926C6CFE9B@cox.net> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> , <33D9DA9C-5539-40FC-8AAF-21926C6CFE9B@cox.net> Message-ID: <8DB1B340-1DBD-5A46-BE6C-24364031F5B4@hxcore.ol> An HTML attachment was scrubbed... URL: From stuefe at openjdk.org Sat Mar 25 16:33:51 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 25 Mar 2023 16:33:51 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 23 Mar 2023 16:32:57 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > Is anybody familiar with the academic literature on this topic? I am sure I am not the first person which has come up with this form of locking. Maybe we could use a name that refers to some academic paper? @rkennke Would you mind integrating these changes to LockStack: https://github.com/tstuefe/jdk/tree/10907%2Blockstack-veris Mainly lockstack wiping unused slots on pop, more verifications and better output. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1483864128 From greggwon at cox.net Sat Mar 25 19:56:50 2023 From: greggwon at cox.net (Gregg G Wonderly) Date: Sat, 25 Mar 2023 14:56:50 -0500 Subject: Disallowing the dynamic loading of agents by default In-Reply-To: <8DB1B340-1DBD-5A46-BE6C-24364031F5B4@hxcore.ol> References: <8DB1B340-1DBD-5A46-BE6C-24364031F5B4@hxcore.ol> Message-ID: <39D27A60-7914-4109-A19E-96AB5608A4F5@cox.net> I understand you may have personal experiences with how you use Java. In my experience and others, Java has constantly had fundamental breakage in various details due to lack of understanding, on the platform development team(s) of what people actually do with Java. Sent from my iPhone > On Mar 25, 2023, at 5:29 AM, Bernd wrote: > > ? > Gregg i have to disagree, not only is Java one of the most stable platform out there but also the ?Enterprise Desktop? couldn?t care less about the default of this switch. > > Gruss > Bernd > -- > http://bernd.eckenfels.net > > Von: serviceability-dev im Auftrag von Gregg Wonderly > Gesendet: Freitag, M?rz 24, 2023 11:42 PM > An: Andrew Dinn > Cc: Ron Pressler ; jigsaw-dev at openjdk.org ; serviceability-dev at openjdk.org > Betreff: Re: Disallowing the dynamic loading of agents by default > > Lot?s of people use Java in places where there is no ?release? cycle of Java version in control of the users. These are ?corporate users? in most cases and they have Java applications that they are using which will just ?stop working? when a new version of Java is installed. > > Over the years, I?ve watch any favoritism towards java on the desktop or as a general solution programing language wane, because it?s undependable as a platform. You never no when something will break as these ?stability? changes occur. People who use software systems are in large part not programers or language/platform experts. The in ability of Oracle and many others to understand how detrimental this behavior has been is just mind blowing. > > People like myself end up looking like whining babies because we come back every once in a while to see if there is something useful happening in Java development that might finally stabilize the platform on the desktop and other business environments and low and behold write-once-run-anywhere is found to still be unimplemented and basically non appreciated. It?s just a sad, sad thing to see happening. > > Sun first did this with Java 1.2. The Community beat up on Sun severely and everything quieted down for a while. Then we had the JDK 1.5 release where my much mentioned volatile reachability optimizations broke software all over the place. > > This is not happening in any other language I am aware of. The people at Sun who were causing all the problems seemed to have gone on to Oracle and there is just a core group of people who just do not understand how horrible Java looks these days because of how much basic functionality got completely broken when a new version of Java showed up on general purpose computing and working software just stopped working? > > Gregg Wonderly > > > On Mar 24, 2023, at 12:21 PM, Andrew Dinn wrote: > > > > Hi Ron, > > > > Thank you for providing a heads up on the proposed JEP. The Red Hat Java team have been discussing this proposal. We have reviewed the original discussion and also the surrounding debate which established requirements for adaptation of Jigsaw to incorporate the needs of agents. > > > > As an aside, I'll note that a thorough review was necessary /even/ in my case, despite the fact that I was an active party, because the discussion occurred, and corresponding decisions were made, quite some time ago. I mention this because it may explain the air of surprise and the desire to reiterate some of the original debate on the part of some respondents in this thread, who perhaps were not party, or only tangentially party, to the discussion. > > > > That also suggests that there may be a lot users who are not aware that the -XX:+EnableDynamicAgentLoading switch exists or do not really understand why it exists i.e. that there is a broad education issue at play here. > > > > We do have some concerns about the JEP, specifically about the timing of its delivery. These are probably best addressed via the normal review process. In particular that will ensure the discussion happens in a more suitable and more widely subscribed forum than the Jigsaw list. However, I will briefly mention our concerns in this reply. Before that let me start with a few disclaimers: > > > > - We acknowledge that there is little to be gained from re-iterating arguments made in the previous discussion (although that does not imply the JEP review would not benefit from new arguments, especially from those who were not involved in that discussion) > > > > - We recognize that the purpose of the -XX:+EnableDynamicAgentLoading switch is to offer a platform integrity guarantee and that this change of the default reflects a desire to prioritise integrity over the flexibility that agents provide > > > > - We recognize that the proposal is only proposing to flip a configuration default rather than detract from (or modify) available functionality > > > > - We recognize that changing this default will still allow (*most*) users to configure the behaviour they desire > > > > - We recognize that this advance notice has been given precisely to ensure that anyone wishing to deploy on jdk21 an app that relies on use of agents has time to plan appropriate configuration for their deployment > > > > - We recognize that this change of default is not being proposed for backport and hence that it will largely only affect the relatively small number of users who are currently developing for jdk21+ > > > > So, given that as a base for our comments where is the beef? > > > > - Our main concern is, predictably, timing. Clearly, this is a future, potential problem rather than a present problem - no one can be deploying on jdk21 yet and most developers who are currently preparing an app for deployment on jdk21+ will likely encounter the effect of this change before actual deployment and be in a position to remedy it. The concern is that advertising a change like this and getting users prepared to respond to it has always been difficult to achieve. In particular we expect a long tail of support problems from users who are trying to upgrade deployments from earlier releases to jdk21. > > So, while it is nice to have such early notice of the proposal we plan to review its likely impact on our users and how much time we need to prepare ourselves and our users to negotiate this change in behaviour. Any evidence we obtain to suggest a delay in targeting is appropriate will be brought to the JEP review. > > > > - A second, related concern is that flipping the default for this configuration in an LTS release as the first exposure to it for most people is more likely to derail deployment plans for users than if the default were flipped in a non-LTS release. If this change were deferred to jdk22 then that would give those planning deployment on (or upgrade to) jdk25 and also those planning to upgrade from jdk17 to jdk21 more time to discover and respond to the change. > > > > - A third concern, already pointed out by Volker, is that some users may run their Java apps via launcher apps or scripts that mask access to the Java command line. For such users the change of default may mean that they lose the option to deploy dynamic agents for important ancillary tasks such as observability. We are not clear how many of our users this affects but we will be looking into this and hope to bring feedback to the JEP review. > > Obviously, this problem can be remedied relatively easily by the supplier of the launcher enabling agent use or providing a suitable control switch. Our concern is not with how to solve this problem rather how the involvement of two parties, supplier and end user, might imply a need for the JEP to be targeted to a later release. > > > > regards, > > > > > > Andrew Dinn > > ----------- > > Red Hat Distinguished Engineer > > Red Hat UK Ltd > > Registered in England and Wales under Company Registration No. 03798903 > > Directors: Michael Cunningham, Michael ("Mike") O'Neill > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Mon Mar 27 02:31:55 2023 From: david.holmes at oracle.com (David Holmes) Date: Mon, 27 Mar 2023 12:31:55 +1000 Subject: Missing EnclosingMethod attribute in JvmtiClassFileReconstituter.cpp In-Reply-To: References: Message-ID: Hi Manuel, On 14/02/2023 9:14 pm, Manuel ?lvarez ?lvarez wrote: > Dear all, > > When dealing with enclosed classes, frameworks like bytebuddy use the > EnclosingMethod attribute in order to discover generic type argument > bounds. When retransforming a class, the?JvmtiClassFileReconstituter.cpp > omits the enclosing attributes (they are available in the > java.lang.Class object) so the bytes received by the transformer are > missing the attribute potentially causing issues downstream. > > Are there any strong reasons why these attributes are not written by the > JvmtiClassFileReconstituter? I can't find anything specific about why this is missing, just a general note in the JVMTI spec for retransformClasses that some attributes may be missing: "The initial class file bytes represent the bytes passed to ClassLoader.defineClass or RedefineClasses (before any transformations were applied), however they may not exactly match them. The constant pool may differ in ways described in GetConstantPool. Constant pool indices in the bytecodes of methods will correspond. Some attributes may not be present." I suspect the omission is simply because this is not an attribute that the VM uses in any way. Others may have more insight. Cheers, David > Kind regards and thank you in advance, > > Manuel. > > From dholmes at openjdk.org Mon Mar 27 02:49:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Mar 2023 02:49:57 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 17:11:46 GMT, Thomas Stuefe wrote: > AFAIU, these places now could get outdated information from the contains function: either getting a false positive or a false negative when asking for a given object. But if I understand correctly, this had been the case before, too. Since the information about ownership may be outdated as soon as one read the BasicLock* address from the markwork. So these APIs do only report a state that once had been, but never the current state. Sure but my concern is not stale data, it is whether the lock-stack code may crash (or otherwise misbehave) if there are concurrent changes to it whilst it is being queried. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1484401692 From dholmes at openjdk.org Mon Mar 27 05:12:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Mar 2023 05:12:31 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Fri, 24 Mar 2023 06:31:14 GMT, Leonid Mesnik wrote: >> The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added comments and trim arguments test/jdk/com/sun/jdi/TestScaffold.java line 469: > 467: // the first argument not-starting with '-' is treated as a classname > 468: // the other arguments are split to targetVMArgs targetAppCommandLine correspondingly > 469: // The example of args for line '@run driver Frames2Test -Xss4M' is '-Xss4M' 'Frames2Targ'. Is `Frames2Test` a typo there? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13170#discussion_r1148791133 From dholmes at openjdk.org Mon Mar 27 05:16:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Mar 2023 05:16:29 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) In-Reply-To: <1-OWyOhgZaUx8BefLbyZyOJNZtZ9_LiKfnIsXMhT67I=.e826f98a-afea-45d2-b29b-33299bfbf498@github.com> References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> <1-OWyOhgZaUx8BefLbyZyOJNZtZ9_LiKfnIsXMhT67I=.e826f98a-afea-45d2-b29b-33299bfbf498@github.com> Message-ID: On Fri, 24 Mar 2023 06:27:02 GMT, Leonid Mesnik wrote: > The problem was that wrapper inserted it's arguments before vm options because didn't expect them in the args. Sorry still not sure what you mean by that. Do you mean the `--enable-preview` arg? That was appended to VM args: argInfo.targetVMArgs += "--enable-preview "; which seems perfectly fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1484508003 From lmesnik at openjdk.org Mon Mar 27 05:45:31 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 27 Mar 2023 05:45:31 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Mon, 27 Mar 2023 05:09:55 GMT, David Holmes wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> added comments and trim arguments > > test/jdk/com/sun/jdi/TestScaffold.java line 469: > >> 467: // the first argument not-starting with '-' is treated as a classname >> 468: // the other arguments are split to targetVMArgs targetAppCommandLine correspondingly >> 469: // The example of args for line '@run driver Frames2Test -Xss4M' is '-Xss4M' 'Frames2Targ'. > > Is `Frames2Test` a typo there? Frames2Test is the name of test that uses additional command-line options. It is an example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13170#discussion_r1148808465 From lmesnik at openjdk.org Mon Mar 27 05:54:29 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 27 Mar 2023 05:54:29 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Fri, 24 Mar 2023 06:31:14 GMT, Leonid Mesnik wrote: >> The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added comments and trim arguments before the fix, the parse args incorrectly compos argInfo, putting all args into targetAppCommandLine, including VM args. So the result for -Xss4M Frames2Targ' was argInfo.targetAppCommandLine : -Xss4M Frames2Targ argInfo.targetVMArgs : without wrapper. Which is not very correct but didn't cause failures. But becomes a problem when the wrapper tries to insert a new class. The old command; argInfo.targetAppCommandLine : TestScaffold Virtual -Xss4M Frames2Targ argInfo.targetVMArgs : --enable-preview The new commands: argInfo.targetAppCommandLine : TestScaffold Virtual Frames2Targ argInfo.targetVMArgs : -Xss4M --enable-preview ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1484533916 From dholmes at openjdk.org Mon Mar 27 05:54:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Mar 2023 05:54:31 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Mon, 27 Mar 2023 05:42:12 GMT, Leonid Mesnik wrote: >> test/jdk/com/sun/jdi/TestScaffold.java line 469: >> >>> 467: // the first argument not-starting with '-' is treated as a classname >>> 468: // the other arguments are split to targetVMArgs targetAppCommandLine correspondingly >>> 469: // The example of args for line '@run driver Frames2Test -Xss4M' is '-Xss4M' 'Frames2Targ'. >> >> Is `Frames2Test` a typo there? > > Frames2Test is the name of test that uses additional command-line options. It is an example. In that case did you mean: '@run driver Frames2Test -Xss4M Frames2Targ' ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13170#discussion_r1148813438 From dholmes at openjdk.org Mon Mar 27 05:58:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Mar 2023 05:58:30 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Fri, 24 Mar 2023 06:31:14 GMT, Leonid Mesnik wrote: >> The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added comments and trim arguments Isn't the `-Xss4M` supposed to be passed as `-J-Xss4M`? That is what the original logic is expecting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1484537292 From stuefe at openjdk.org Mon Mar 27 06:39:53 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 Mar 2023 06:39:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Mon, 27 Mar 2023 02:46:31 GMT, David Holmes wrote: > > AFAIU, these places now could get outdated information from the contains function: either getting a false positive or a false negative when asking for a given object. But if I understand correctly, this had been the case before, too. Since the information about ownership may be outdated as soon as one read the BasicLock* address from the markwork. So these APIs do only report a state that once had been, but never the current state. > > Sure but my concern is not stale data, it is whether the lock-stack code may crash (or otherwise misbehave) if there are concurrent changes to it whilst it is being queried. Lockstack is a fixed sized array of oop with a ptr-sized member indicating the current offset. It is embedded into Thread. Modifying the LockStack (push, pop, remove) is non-atomic: write the element, then update the current offset. The contains function - the only function not only called from current thread (and we should assert that) - reads current offset, then iterates the whole array up to offset, comparing each oop with the given one. The only way I see this crashing is if we read the current offset in a half-written state. Not sure if any of our platforms read/write a pointer-sized field non-atomic. All other misreads - would "just" result in a time window where contains() gives the wrong answer. So we should read the offset atomically. Alternatively, have a second contains(), e.g. `contains_full()`, just for the sake of concurrent iteration, and let that one iterate the whole stack incl. unused slots. Iteration limits would be hardcoded offsets from Thread*, no need to read offset. The obvious disadvantage would be that we'd need to mark/zero out popped slots in release builds. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1484583227 From duke at openjdk.org Mon Mar 27 08:46:30 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Mon, 27 Mar 2023 08:46:30 GMT Subject: RFR: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 19:47:10 GMT, Eirik Bjorsnos wrote: > Please review this PR which replaces the use of outdated JVM flags for setting up debugging in the test value004.java > > This is part of an ongoing effort to remove use of the outdated flag '-Djava.compiler" such that the option itself can eventually be removed. This approved PR is looking for a sponsor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13107#issuecomment-1484743714 From duke at openjdk.org Mon Mar 27 08:47:32 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Mon, 27 Mar 2023 08:47:32 GMT Subject: RFR: 8304547: Remove checking of -Djava.compiler in src/jdk.jdi/share/classes/com/sun/tools/jdi/SunCommandLineLauncher.java In-Reply-To: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> References: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> Message-ID: On Mon, 20 Mar 2023 20:53:41 GMT, Eirik Bjorsnos wrote: > Please review this PR which removes the following outdated guard/check from SunCommandLineLauncher: > > > if ((options.indexOf("-Djava.compiler=") != -1) && > (options.toLowerCase().indexOf("-djava.compiler=none") == -1)) { > throw new IllegalConnectorArgumentsException("Cannot debug with a JIT compiler", > ARG_OPTIONS); > } > > > > Efforts are underway to remove the `java.compiler` system property entirely, besides that this test no longer makes sense since debugging with a JIT has been supported for a while. This approved PR is looking for a sponsor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13109#issuecomment-1484745199 From gcao at openjdk.org Mon Mar 27 10:20:44 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 27 Mar 2023 10:20:44 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v13] In-Reply-To: References: Message-ID: On Sat, 25 Mar 2023 05:11:16 GMT, Fei Yang wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Improved interpreter comments aarch64 > > Changes requested by fyang (Reviewer). More changes for RISC-V which resolve @RealFYang's review comments. @matias9927 Please help integrate this https://github.com/zifeihan/jdk/commit/413dd02221598edba9e44ee7d9b20414bde76243 (on this branch: https://github.com/zifeihan/jdk/commits/12778_riscv_port_v4) By the way, the riscv port passed the tier1-3 tests on unmatched board with no new errors introduced. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12778#issuecomment-1484884543 From ron.pressler at oracle.com Mon Mar 27 10:32:24 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Mon, 27 Mar 2023 10:32:24 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> Message-ID: <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> Hi Andrew! > On 24 Mar 2023, at 17:21, Andrew Dinn wrote: > > Hi Ron, > > Thank you for providing a heads up on the proposed JEP. The Red Hat Java team have been discussing this proposal. We have reviewed the original discussion and also the surrounding debate which established requirements for adaptation of Jigsaw to incorporate the needs of agents. > > As an aside, I'll note that a thorough review was necessary /even/ in my case, despite the fact that I was an active party, because the discussion occurred, and corresponding decisions were made, quite some time ago. I mention this because it may explain the air of surprise and the desire to reiterate some of the original debate on the part of some respondents in this thread, who perhaps were not party, or only tangentially party, to the discussion. > > That also suggests that there may be a lot users who are not aware that the -XX:+EnableDynamicAgentLoading switch exists or do not really understand why it exists i.e. that there is a broad education issue at play here. Understood. > > We do have some concerns about the JEP, specifically about the timing of its delivery. These are probably best addressed via the normal review process. In particular that will ensure the discussion happens in a more suitable and more widely subscribed forum than the Jigsaw list. However, I will briefly mention our concerns in this reply. Before that let me start with a few disclaimers: > > - We acknowledge that there is little to be gained from re-iterating arguments made in the previous discussion (although that does not imply the JEP review would not benefit from new arguments, especially from those who were not involved in that discussion) > > - We recognize that the purpose of the -XX:+EnableDynamicAgentLoading switch is to offer a platform integrity guarantee and that this change of the default reflects a desire to prioritise integrity over the flexibility that agents provide I would qualify that: we want to prioritise integrity *by default*. Integrity is only practical when it is the default, as adding more integrity rules after the fact is hard to the point of being impractical. Agents in themselves don?t impact that default, but dynamically loaded agents do. > > - We recognize that the proposal is only proposing to flip a configuration default rather than detract from (or modify) available functionality > > - We recognize that changing this default will still allow (*most*) users to configure the behaviour they desire > > - We recognize that this advance notice has been given precisely to ensure that anyone wishing to deploy on jdk21 an app that relies on use of agents has time to plan appropriate configuration for their deployment > > - We recognize that this change of default is not being proposed for backport and hence that it will largely only affect the relatively small number of users who are currently developing for jdk21+ > > So, given that as a base for our comments where is the beef? > > - Our main concern is, predictably, timing. Clearly, this is a future, potential problem rather than a present problem - no one can be deploying on jdk21 yet and most developers who are currently preparing an app for deployment on jdk21+ will likely encounter the effect of this change before actual deployment and be in a position to remedy it. The concern is that advertising a change like this and getting users prepared to respond to it has always been difficult to achieve. In particular we expect a long tail of support problems from users who are trying to upgrade deployments from earlier releases to jdk21. > So, while it is nice to have such early notice of the proposal we plan to review its likely impact on our users and how much time we need to prepare ourselves and our users to negotiate this change in behaviour. Any evidence we obtain to suggest a delay in targeting is appropriate will be brought to the JEP review. Very well, we can certainly discuss timing as part of the JEP discussion. The JEP itself is rather ambitious because I noticed that most of the strong encapsulation JEPs lacked a substantial motivation section (largely because they had been written before that became the norm) and so strong encapsulation has not been motivated in JEP form, and this is an opportunity to start rectifying that. This is important because some, even in this discussion, are under the impression that integrity is about security. Although security is certainly one of integrity?s impacts (albeit not in the way hypothesised in this discussion) other major impacts are on performance (including, though certainly not limited to, link-time optimisations that are of interest to Project Leyden) and code evolution (the lack of integrity has been, by far, the biggest cause of JDK upgrade issues experienced by many applications an libraries). > > - A second, related concern is that flipping the default for this configuration in an LTS release as the first exposure to it for most people is more likely to derail deployment plans for users than if the default were flipped in a non-LTS release. If this change were deferred to jdk22 then that would give those planning deployment on (or upgrade to) jdk25 and also those planning to upgrade from jdk17 to jdk21 more time to discover and respond to the change. While we should certainly discuss timing after publishing the draft JEP, I?m not sure how relevant this particular argument is. Those who upgrade from 17 to 21 don?t care which of the versions they skipped introduced a change, and even the deprecation process does not take into account versions for which Oracle and other vendors choose to offer an LTS service. JDK 17 itself also introduced a far bigger tightening of strong encapsulation than the one discussed here. Furthermore, those who wish to upgrade from one version that has LTS offerings to another avail themselves of the LTS service to upgrade not immediately when the new version is released, so they are under no time pressure. > > - A third concern, already pointed out by Volker, is that some users may run their Java apps via launcher apps or scripts that mask access to the Java command line. For such users the change of default may mean that they lose the option to deploy dynamic agents for important ancillary tasks such as observability. We are not clear how many of our users this affects but we will be looking into this and hope to bring feedback to the JEP review. > Obviously, this problem can be remedied relatively easily by the supplier of the launcher enabling agent use or providing a suitable control switch. Our concern is not with how to solve this problem rather how the involvement of two parties, supplier and end user, might imply a need for the JEP to be targeted to a later release. This, too, is an argument that?s hard for me to understand. First, many JDK releases require changes to the command line, for various reasons. JDK 17 required bigger changes than the one announced here, and JDK 21 itself may well require other such changes that impact even more applications than this one ? making it an opportunity rather than a liability. Second, such changes are normally announced *later* than this one has. If an application under such constraints always uses the current JDK release, then surely a six-month notice is enough, and if it opts to use an LTS service, then it?s under no pressure. Anyway, let?s continue this discussion after I publish the draft JEP. ? Ron From ron.pressler at oracle.com Mon Mar 27 10:46:58 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Mon, 27 Mar 2023 10:46:58 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <39D27A60-7914-4109-A19E-96AB5608A4F5@cox.net> References: <8DB1B340-1DBD-5A46-BE6C-24364031F5B4@hxcore.ol> <39D27A60-7914-4109-A19E-96AB5608A4F5@cox.net> Message-ID: <67A419ED-7C7B-4548-BD2A-922AFC9422B2@oracle.com> > On 25 Mar 2023, at 19:56, Gregg G Wonderly wrote: > > I understand you may have personal experiences with how you use Java. In my experience and others, Java has constantly had fundamental breakage in various details due to lack of understanding, on the platform development team(s) of what people actually do with Java. > > Sent from my iPhone Since at this point it is an established fact that the vast majority of JDK upgrade difficulties (certainly weighed by cost) have been the result of lack of integrity guarantees made by strong encapsulation, those who care about stability should welcome this change and the others to follow that will close the remaining loopholes in strong encapsulation. Strong encapsulation has already proven itself over the past several releases in allowing us to make big internal changes to the platform (such as those required for virtual threads) with little adverse effect on compatibility, something that wasn?t possible before. The success of strong encapsulation in that area shows that it?s the right approach for letting Java evolve, even in very significant ways, and at the same time reduce upgrade costs. ? Ron From volker.simonis at gmail.com Mon Mar 27 12:37:48 2023 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 27 Mar 2023 14:37:48 +0200 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> Message-ID: On Mon, Mar 27, 2023 at 12:32?PM Ron Pressler wrote: > > Hi Andrew! > > > On 24 Mar 2023, at 17:21, Andrew Dinn wrote: > > > > Hi Ron, > > > > Thank you for providing a heads up on the proposed JEP. The Red Hat Java team have been discussing this proposal. We have reviewed the original discussion and also the surrounding debate which established requirements for adaptation of Jigsaw to incorporate the needs of agents. > > > > As an aside, I'll note that a thorough review was necessary /even/ in my case, despite the fact that I was an active party, because the discussion occurred, and corresponding decisions were made, quite some time ago. I mention this because it may explain the air of surprise and the desire to reiterate some of the original debate on the part of some respondents in this thread, who perhaps were not party, or only tangentially party, to the discussion. > > > > That also suggests that there may be a lot users who are not aware that the -XX:+EnableDynamicAgentLoading switch exists or do not really understand why it exists i.e. that there is a broad education issue at play here. > > Understood. > > > > > We do have some concerns about the JEP, specifically about the timing of its delivery. These are probably best addressed via the normal review process. In particular that will ensure the discussion happens in a more suitable and more widely subscribed forum than the Jigsaw list. However, I will briefly mention our concerns in this reply. Before that let me start with a few disclaimers: > > > > - We acknowledge that there is little to be gained from re-iterating arguments made in the previous discussion (although that does not imply the JEP review would not benefit from new arguments, especially from those who were not involved in that discussion) > > > > - We recognize that the purpose of the -XX:+EnableDynamicAgentLoading switch is to offer a platform integrity guarantee and that this change of the default reflects a desire to prioritise integrity over the flexibility that agents provide > > I would qualify that: we want to prioritise integrity *by default*. Integrity is only practical when it is the default, as adding more integrity rules after the fact is hard to the point of being impractical. Agents in themselves don?t impact that default, but dynamically loaded agents do. > > > > > - We recognize that the proposal is only proposing to flip a configuration default rather than detract from (or modify) available functionality > > > > - We recognize that changing this default will still allow (*most*) users to configure the behaviour they desire > > > > - We recognize that this advance notice has been given precisely to ensure that anyone wishing to deploy on jdk21 an app that relies on use of agents has time to plan appropriate configuration for their deployment > > > > - We recognize that this change of default is not being proposed for backport and hence that it will largely only affect the relatively small number of users who are currently developing for jdk21+ > > > > So, given that as a base for our comments where is the beef? > > > > - Our main concern is, predictably, timing. Clearly, this is a future, potential problem rather than a present problem - no one can be deploying on jdk21 yet and most developers who are currently preparing an app for deployment on jdk21+ will likely encounter the effect of this change before actual deployment and be in a position to remedy it. The concern is that advertising a change like this and getting users prepared to respond to it has always been difficult to achieve. In particular we expect a long tail of support problems from users who are trying to upgrade deployments from earlier releases to jdk21. > > So, while it is nice to have such early notice of the proposal we plan to review its likely impact on our users and how much time we need to prepare ourselves and our users to negotiate this change in behaviour. Any evidence we obtain to suggest a delay in targeting is appropriate will be brought to the JEP review. > > Very well, we can certainly discuss timing as part of the JEP discussion. > > The JEP itself is rather ambitious because I noticed that most of the strong encapsulation JEPs lacked a substantial motivation section (largely because they had been written before that became the norm) and so strong encapsulation has not been motivated in JEP form, and this is an opportunity to start rectifying that. This is important because some, even in this discussion, are under the impression that integrity is about security. Although security is certainly one of integrity?s impacts (albeit not in the way hypothesised in this discussion) other major impacts are on performance (including, though certainly not limited to, link-time optimisations that are of interest to Project Leyden) and code evolution (the lack of integrity has been, by far, the biggest cause of JDK upgrade issues experienced by many applications an libraries). Thanks for pointing out that "integritiy" and "security" are two different things and that this discussion is mostly about other aspects of "integrity" like performance and code evaluation. This is actually exactly why I already tried to ask several times about the "real" and/or the long term background of this change. Currently the JEP only seems to propose the change of the default value for dynamic agent loading. It is obviously not hard for other JDK vendors to use a different default and I agree that it is probably still manageable (though inconvenient) for administrators/operators to change the default at launch time. BUT, you rightly mention that once higher integrity is the DEFAULT, this opens the door for future optimizations (you listed some of them) and even completely different execution models / semantics for Java applications (e.g. as explored by Project Leyden). Once such new optimizations will be in place (and only work for the default, disabled dynamic agent loading setting) it will be much harder for users who depend on dynamic loading to enable it, because it will either impact their performance or it will limit their ability to use certain platform features. My main concern with the proposed change is not the current proposal but the impact it will have on the evolution of Java. Java's dynamic features are one of its biggest strength and a major reason for its success. Sacrificing some of them or making their usage increasingly expensive requires a broader discussion in the community and shouldn't happen "under the hood". I'm happy to continue that discussion on the actual JEP proposal. > > > > > - A second, related concern is that flipping the default for this configuration in an LTS release as the first exposure to it for most people is more likely to derail deployment plans for users than if the default were flipped in a non-LTS release. If this change were deferred to jdk22 then that would give those planning deployment on (or upgrade to) jdk25 and also those planning to upgrade from jdk17 to jdk21 more time to discover and respond to the change. > > While we should certainly discuss timing after publishing the draft JEP, I?m not sure how relevant this particular argument is. Those who upgrade from 17 to 21 don?t care which of the versions they skipped introduced a change, and even the deprecation process does not take into account versions for which Oracle and other vendors choose to offer an LTS service. JDK 17 itself also introduced a far bigger tightening of strong encapsulation than the one discussed here. Furthermore, those who wish to upgrade from one version that has LTS offerings to another avail themselves of the LTS service to upgrade not immediately when the new version is released, so they are under no time pressure. > > > > > - A third concern, already pointed out by Volker, is that some users may run their Java apps via launcher apps or scripts that mask access to the Java command line. For such users the change of default may mean that they lose the option to deploy dynamic agents for important ancillary tasks such as observability. We are not clear how many of our users this affects but we will be looking into this and hope to bring feedback to the JEP review. > > Obviously, this problem can be remedied relatively easily by the supplier of the launcher enabling agent use or providing a suitable control switch. Our concern is not with how to solve this problem rather how the involvement of two parties, supplier and end user, might imply a need for the JEP to be targeted to a later release. > > This, too, is an argument that?s hard for me to understand. First, many JDK releases require changes to the command line, for various reasons. JDK 17 required bigger changes than the one announced here, and JDK 21 itself may well require other such changes that impact even more applications than this one ? making it an opportunity rather than a liability. Second, such changes are normally announced *later* than this one has. If an application under such constraints always uses the current JDK release, then surely a six-month notice is enough, and if it opts to use an LTS service, then it?s under no pressure. > > Anyway, let?s continue this discussion after I publish the draft JEP. > > ? Ron > > From duke at openjdk.org Mon Mar 27 14:37:25 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Mon, 27 Mar 2023 14:37:25 GMT Subject: Integrated: 8304547: Remove checking of -Djava.compiler in src/jdk.jdi/share/classes/com/sun/tools/jdi/SunCommandLineLauncher.java In-Reply-To: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> References: <8uSaJCACWRUvS5faPTBvlBMNhRYQb_TW7AgQhKUNCoI=.9de3a32a-028e-4ca2-8914-f2fcde9d34af@github.com> Message-ID: On Mon, 20 Mar 2023 20:53:41 GMT, Eirik Bjorsnos wrote: > Please review this PR which removes the following outdated guard/check from SunCommandLineLauncher: > > > if ((options.indexOf("-Djava.compiler=") != -1) && > (options.toLowerCase().indexOf("-djava.compiler=none") == -1)) { > throw new IllegalConnectorArgumentsException("Cannot debug with a JIT compiler", > ARG_OPTIONS); > } > > > > Efforts are underway to remove the `java.compiler` system property entirely, besides that this test no longer makes sense since debugging with a JIT has been supported for a while. This pull request has now been integrated. Changeset: 46b06023 Author: Eirik Bjorsnos Committer: Alan Bateman URL: https://git.openjdk.org/jdk/commit/46b0602376893df204bf4d624938bf89abe04d89 Stats: 7 lines in 1 file changed: 0 ins; 6 del; 1 mod 8304547: Remove checking of -Djava.compiler in src/jdk.jdi/share/classes/com/sun/tools/jdi/SunCommandLineLauncher.java Reviewed-by: dholmes, cjplummer, alanb ------------- PR: https://git.openjdk.org/jdk/pull/13109 From matsaave at openjdk.org Mon Mar 27 14:43:04 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 27 Mar 2023 14:43:04 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v14] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: RISCV patch and aarch64 improvement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/ff7f3503..84ed272a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=12-13 Stats: 21 lines in 3 files changed: 3 ins; 7 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From stuefe at openjdk.org Mon Mar 27 15:12:50 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 Mar 2023 15:12:50 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: <5y28pIWRTKPThWndmhdujCKNuvUMAbQwo2yXa3TUBv8=.3a1970ae-b732-4c6e-b96a-063b5c9d498e@github.com> On Thu, 23 Mar 2023 16:32:57 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > Is anybody familiar with the academic literature on this topic? I am sure I am not the first person which has come up with this form of locking. Maybe we could use a name that refers to some academic paper? @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485293012 From rkennke at openjdk.org Mon Mar 27 15:57:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 27 Mar 2023 15:57:13 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 23 Mar 2023 16:32:57 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > Is anybody familiar with the academic literature on this topic? I am sure I am not the first person which has come up with this form of locking. Maybe we could use a name that refers to some academic paper? > @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers *before* the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485390285 From adinn at redhat.com Mon Mar 27 16:06:32 2023 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 27 Mar 2023 17:06:32 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> Message-ID: <6aba7a4c-acc0-8026-e35e-e123200dab1c@redhat.com> On 27/03/2023 13:37, Volker Simonis wrote: >> The JEP itself is rather ambitious because I noticed that most of the strong encapsulation JEPs lacked a substantial motivation section (largely because they had been written before that became the norm) and so strong encapsulation has not been motivated in JEP form, and this is an opportunity to start rectifying that. This is important because some, even in this discussion, are under the impression that integrity is about security. Although security is certainly one of integrity?s impacts (albeit not in the way hypothesised in this discussion) other major impacts are on performance (including, though certainly not limited to, link-time optimisations that are of interest to Project Leyden) and code evolution (the lack of integrity has been, by far, the biggest cause of JDK upgrade issues experienced by many applications an libraries). > > Thanks for pointing out that "integritiy" and "security" are two > different things and that this discussion is mostly about other > aspects of "integrity" like performance and code evaluation. This is > actually exactly why I already tried to ask several times about the > "real" and/or the long term background of this change. Currently the > JEP only seems to propose the change of the default value for dynamic > agent loading. It is obviously not hard for other JDK vendors to use a > different default and I agree that it is probably still manageable > (though inconvenient) for administrators/operators to change the > default at launch time. BUT, you rightly mention that once higher > integrity is the DEFAULT, this opens the door for future optimizations > (you listed some of them) and even completely different execution > models / semantics for Java applications (e.g. as explored by Project > Leyden). Once such new optimizations will be in place (and only work > for the default, disabled dynamic agent loading setting) it will be > much harder for users who depend on dynamic loading to enable it, > because it will either impact their performance or it will limit their > ability to use certain platform features. Red Hat's team had discusssed this issue and we did think about raising it in the previous reply. It is perhaps better left for the JEP discussion but we certainly consider it as important an issue as Volker does. Personally, I will note that this has been point I have been concerned by since the associated dilemma was raised by Mark Reinhold during the original discussion. Indeed, the possibility that retaining agents might necessitate a divergence in JVM (or even JDK) behaviour from the status quo when no agent is, nor can be, installed struck me very hard. My take-away was not an immediate abreaction. Rather: the opportunity agents provide for dynamic adaptation of the runtime is provisional and, needs to be balanced against the benefits that might accrue from them not being the picture; in the longer term some of the capabilities provided through the use of agents may be unsustainable. As an example of that I considered my agent, Byteman. It is immensely useful for injecting faults, validity assertions and monitoring capabilities into app code during unit system testing, avoiding the need for that code to actually appear in the product. The precise targeting of Byteman rules to specific code locations means that the injected code only minimally perturbs the whole code base. Notably, Byteman does not support bulk (online) transformation. Too much transformed code means you are effectively testing a very different app to the one you deploy, at least as far as execution speed, timings, resource use etc are concerned, and possibly even because of too radical a change to the app semantics. Clearly, if simply loading an agent provides opportunities may eventually lead to for the runtime being able to operate more efficiently and, as a result, cause in a significant change in execution speed, timings, resource use then a tool like Byteman becomes much less useful for this sort of testing. So, for me the writing was already on the wall back in jdk9 time. We may well have to trade off of some build/test time benefits against the undeniable impetus to improve deploy-time performance. > My main concern with the proposed change is not the current proposal > but the impact it will have on the evolution of Java. Java's dynamic > features are one of its biggest strength and a major reason for its > success. Sacrificing some of them or making their usage increasingly > expensive requires a broader discussion in the community and shouldn't > happen "under the hood". I'm happy to continue that discussion on the > actual JEP proposal. Amen to that. regards, Andrew Dinn ----------- From adinn at redhat.com Mon Mar 27 16:10:47 2023 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 27 Mar 2023 17:10:47 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <6aba7a4c-acc0-8026-e35e-e123200dab1c@redhat.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> <6aba7a4c-acc0-8026-e35e-e123200dab1c@redhat.com> Message-ID: <862f5601-9930-2930-5b67-429537f75ba3@redhat.com> Sorry, let me correct some of my mangled grammar On 27/03/2023 17:06, Andrew Dinn wrote: > On 27/03/2023 13:37, Volker Simonis wrote: >>> The JEP itself is rather ambitious because I noticed that most of the >>> strong encapsulation JEPs lacked a substantial motivation section >>> (largely because they had been written before that became the norm) >>> and so strong encapsulation has not been motivated in JEP form, and >>> this is an opportunity to start rectifying that. This is important >>> because some, even in this discussion, are under the impression that >>> integrity is about security. Although security is certainly one of >>> integrity?s impacts (albeit not in the way hypothesised in this >>> discussion) other major impacts are on performance (including, though >>> certainly not limited to, link-time optimisations that are of >>> interest to Project Leyden) and code evolution (the lack of integrity >>> has been, by far, the biggest cause of JDK upgrade issues experienced >>> by many applications an libraries). >> >> Thanks for pointing out that "integritiy" and "security" are two >> different things and that this discussion is mostly about other >> aspects of "integrity" like performance and code evaluation. This is >> actually exactly why I already tried to ask several times about the >> "real" and/or the long term background of this change. Currently the >> JEP only seems to propose the change of the default value for dynamic >> agent loading. It is obviously not hard for other JDK vendors to use a >> different default and I agree that it is probably still manageable >> (though inconvenient) for administrators/operators to change the >> default at launch time. BUT, you rightly mention that once higher >> integrity is the DEFAULT, this opens the door for future optimizations >> (you listed some of them) and even completely different execution >> models / semantics for Java applications (e.g. as explored by Project >> Leyden). Once such new optimizations will be in place (and only work >> for the default, disabled dynamic agent loading setting) it will be >> much harder for users who depend on dynamic loading to enable it, >> because it will either impact their performance or it will limit their >> ability to use certain platform features. > > Red Hat's team had discusssed this issue and we did think about raising > it in the previous reply. It is perhaps better left for the JEP > discussion but we certainly consider it as important an issue as Volker > does. > > Personally, I will note that this is a point I have been concerned > by since the associated dilemma was raised by Mark Reinhold during the > original discussion. Indeed, the possibility that retaining agents might > necessitate a divergence in JVM (or even JDK) behaviour from the status > quo when no agent is, nor can be, installed struck me very hard. My > take-away was not an immediate abreaction. Rather: the opportunity > agents provide for dynamic adaptation of the runtime is provisional and, > needs to be balanced against the benefits that might accrue from them > not being the picture; in the longer term some of the capabilities > provided through the use of agents may be unsustainable. > > As an example of that I considered my agent, Byteman. It is immensely > useful for injecting faults, validity assertions and monitoring > capabilities into app code during unit system testing, avoiding the need > for that code to actually appear in the product. The precise targeting > of Byteman rules to specific code locations means that the injected code > only minimally perturbs the whole code base. > > Notably, Byteman does not support bulk (online) transformation. Too much > transformed code means you are effectively testing a very different app > to the one you deploy, at least as far as execution speed, timings, > resource use etc are concerned, and possibly even because of too radical > a change to the app semantics. > > Clearly, if simply loading an agent removes opportunities > for the runtime to operate more > efficiently and, as a result, causes a significant change in execution > speed, timings, resource use then a tool like Byteman becomes much less > useful for this sort of testing. So, for me the writing was already on > the wall back in jdk9 time. We may well have to trade off of some > build/test time benefits against the undeniable impetus to improve > deploy-time performance. > >> My main concern with the proposed change is not the current proposal >> but the impact it will have on the evolution of Java. Java's dynamic >> features are one of its biggest strength and a major reason for its >> success. Sacrificing some of them or making their usage increasingly >> expensive requires a broader discussion in the community and shouldn't >> happen "under the hood". I'm happy to continue that discussion on the >> actual JEP proposal. > Amen to that. > > regards, > > > Andrew Dinn > ----------- > -- regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From adinn at redhat.com Mon Mar 27 16:30:41 2023 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 27 Mar 2023 17:30:41 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> Message-ID: <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> Hi Ron, Thanks for the reply, I believe we have established a lot of common ground here. I'll try to clarify a couple of the things you found difficult to follow. On 27/03/2023 11:32, Ron Pressler wrote: >> - A second, related concern is that flipping the default for this configuration in an LTS release as the first exposure to it for most people is more likely to derail deployment plans for users than if the default were flipped in a non-LTS release. If this change were deferred to jdk22 then that would give those planning deployment on (or upgrade to) jdk25 and also those planning to upgrade from jdk17 to jdk21 more time to discover and respond to the change. > > While we should certainly discuss timing after publishing the draft JEP, I?m not sure how relevant this particular argument is. Those who upgrade from 17 to 21 don?t care which of the versions they skipped introduced a change, and even the deprecation process does not take into account versions for which Oracle and other vendors choose to offer an LTS service. JDK 17 itself also introduced a far bigger tightening of strong encapsulation than the one discussed here. Furthermore, those who wish to upgrade from one version that has LTS offerings to another avail themselves of the LTS service to upgrade not immediately when the new version is released, so they are under no time pressure. It seems to me to be a fairly simple point but I obviously didn't express it very well. Here's another try. If this is pushed in jdk21 then anyone currently developing or upgrading an app to target jdk21 will only have been able to test on jdk17-jdk20 where they will not encounter the issue. So, his nly leaves them a small window to detect that there will be a problem using agents in jdk21. When jdk21 arrives this may force them to delay deployment or they may even deploy unaware that the problem exists. If this is pushed in jdk22 instead of jdk21 then anyone who upgrades from jdk17 to jdk21 will not have a problem. Anyone working on an app for deployment on jdk25 will have the opportunity to test on 3 non_LTS releases which might manifest the potential agent problem before deployment. I hope that explains the problem better. >> - A third concern, already pointed out by Volker, is that some users may run their Java apps via launcher apps or scripts that mask access to the Java command line. For such users the change of default may mean that they lose the option to deploy dynamic agents for important ancillary tasks such as observability. We are not clear how many of our users this affects but we will be looking into this and hope to bring feedback to the JEP review. >> Obviously, this problem can be remedied relatively easily by the supplier of the launcher enabling agent use or providing a suitable control switch. Our concern is not with how to solve this problem rather how the involvement of two parties, supplier and end user, might imply a need for the JEP to be targeted to a later release. > > This, too, is an argument that?s hard for me to understand. First, many JDK releases require changes to the command line, for various reasons. JDK 17 required bigger changes than the one announced here, and JDK 21 itself may well require other such changes that impact even more applications than this one ? making it an opportunity rather than a liability. Second, such changes are normally announced *later* than this one has. If an application under such constraints always uses the current JDK release, then surely a six-month notice is enough, and if it opts to use an LTS service, then it?s under no pressure. Of course, I accept that changes to command line options are nothing new. However, I don't quite see how to get from there to the implication that this specific change cannot therefore raise concerns. I think the truth of that conclusion has to depend on details of the change, specifically what the effect might be on users. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From eosterlund at openjdk.org Mon Mar 27 17:33:19 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 27 Mar 2023 17:33:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Mon, 27 Mar 2023 15:53:47 GMT, Roman Kennke wrote: > > @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? > > > > That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers *before* the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? Oops that are processed in Thread::oops_do should not have load barriers. Other oops should have load barriers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485426029 From rkennke at openjdk.org Mon Mar 27 17:33:20 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 27 Mar 2023 17:33:20 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Mon, 27 Mar 2023 15:53:47 GMT, Roman Kennke wrote: >> Is anybody familiar with the academic literature on this topic? I am sure I am not the first person which has come up with this form of locking. Maybe we could use a name that refers to some academic paper? > >> @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? > > That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers *before* the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? > > > @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? > > > > > > That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers _before_ the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? > > Oops that are processed in Thread::oops_do should not have load barriers. Other oops should have load barriers. Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() which is called from Thread::oops_do(). But help me here: I believe ZGC processes this stuff concurrently, right? So there might be a window where the lock-stack oops would be unprocessed. The lock-stack would not go under the stack-watermark machinery. And if some code (like JVMTI deadlock detection pause) inspects the lockstack, it might see invalid oops? Is that a plausible scenario, or am I missing something? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485550661 From lmesnik at openjdk.org Mon Mar 27 18:05:14 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 27 Mar 2023 18:05:14 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Mon, 27 Mar 2023 05:51:11 GMT, David Holmes wrote: >> Frames2Test is the name of test that uses additional command-line options. It is an example. > > In that case did you mean: > > '@run driver Frames2Test -Xss4M Frames2Targ' > > ? the @run line contains only the test name and additional command-line options if needed, the target app class 'Frames2Targ' is not included. I have copy-pasted this example from https://github.com/openjdk/jdk/blob/0deb648985b018653ccdaf193dc13b3cf21c088a/test/jdk/com/sun/jdi/Frames2Test.java#L34 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13170#discussion_r1149615514 From eosterlund at openjdk.org Mon Mar 27 18:05:47 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 27 Mar 2023 18:05:47 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: <7M8ulxHKJ_fZ-YjePl0p8om1yeJBNugeDMJd7qyVvUE=.15044eff-0f8b-4094-9834-49cf96aac1d8@github.com> On Mon, 27 Mar 2023 17:30:03 GMT, Roman Kennke wrote: > > > > @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? > > > > > > > > > > > > That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers _before_ the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? > > > > > > Oops that are processed in Thread::oops_do should not have load barriers. Other oops should have load barriers. > > > > Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() which is called from Thread::oops_do(). But help me here: I believe ZGC processes this stuff concurrently, right? So there might be a window where the lock-stack oops would be unprocessed. The lock-stack would not go under the stack-watermark machinery. And if some code (like JVMTI deadlock detection pause) inspects the lockstack, it might see invalid oops? Is that a plausible scenario, or am I missing something? The JVMTI deadlock detection runs in a safepoint, doesn't it? Safepoints call start_processing on all threads in safepoint cleanup for non-GC safepoints. That means the lock stack oops should have been processed when the deadlock detection logic runs in a safepoint. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485592271 From rkennke at openjdk.org Mon Mar 27 18:14:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 27 Mar 2023 18:14:36 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <49yCA-Vx9caLf1KSVYnST3QsQ_kJZhny4KKt6kQnapQ=.66c7d6dd-a076-40d1-9dad-ebecf6805674@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> <49yCA-Vx9caLf1KSVYnST3QsQ_kJZhny4KKt6kQnapQ=.66c7d6dd-a076-40d1-9dad-ebecf6805674@github.com> Message-ID: On Fri, 24 Mar 2023 06:16:14 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/share/runtime/arguments.cpp line 1997: > >> 1995: } >> 1996: if (UseHeavyMonitors) { >> 1997: FLAG_SET_DEFAULT(UseFastLocking, false); > > Probably should be an error if both are set on the command-line. I am not sure. UseFastLocking (or whatever we will call it in the near future) chooses between traditional and new stack-locking. We disable both by +UseHeavyMonitors. Also, +UseFastLocking is not really meant to be used by users, at least it is my intention to mainly turn it on programatically it when compact headers (Lilliput) is turned on. When also running with +UseHeavyMonitors, then it doesn't really matter which stack-locking impl is selected, neither would actually be used. I'd rather remove the explicit turning-off of UseFastMonitors under +UseHeavyMonitors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149623912 From rkennke at openjdk.org Mon Mar 27 18:20:54 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 27 Mar 2023 18:20:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <7E8dbbZfswhEgM9yghtrXzUklVrZSCX2N15lhm7nQ_Q=.3a4a029c-bc04-4d0b-a501-a67775c1591b@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> <7E8dbbZfswhEgM9yghtrXzUklVrZSCX2N15lhm7nQ_Q=.3a4a029c-bc04-4d0b-a501-a67775c1591b@github.com> Message-ID: On Fri, 24 Mar 2023 06:39:18 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/share/runtime/synchronizer.cpp line 516: > >> 514: // No room on the lock_stack so fall-through to inflate-enter. >> 515: } else { >> 516: markWord mark = obj->mark(); > > why is it `mark` here but `header` above? Oh I don't know. We are very inconsistent in our nomenclature here and use mark in some places and header in some others (e.g. OM::set_header() or the displaced_header() methods). The name mark is not really fitting and only still exists for historical reasons I believe (when one of the primary functions of the object header is to indicate GC marking?) However, the central data type is still called markWord so I changed my new code paths to use *mark as well. This probably warrants a round of codebase cleanup and consolidation later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149630607 From lmesnik at openjdk.org Mon Mar 27 18:29:11 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 27 Mar 2023 18:29:11 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: <4-FeuUY1NLDmt-xWkgjannEAjXlpkbKkr5OVQnm3eAE=.11e22d68-a321-4464-b8be-435dba3cd008@github.com> On Mon, 27 Mar 2023 05:55:52 GMT, David Holmes wrote: > Isn't the `-Xss4M` supposed to be passed as `-J-Xss4M`? That is what the original logic is expecting. I haven't seen that -J-.. is used here in the args, all tests just use vm flags directly. I am not sure if updating tests is the correct option here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1485654453 From rkennke at openjdk.org Mon Mar 27 18:51:03 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 27 Mar 2023 18:51:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Mon, 27 Mar 2023 17:30:03 GMT, Roman Kennke wrote: >>> @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? >> >> That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers *before* the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? > >> > > @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? >> > >> > >> > That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers _before_ the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? >> >> Oops that are processed in Thread::oops_do should not have load barriers. Other oops should have load barriers. > > Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() which is called from Thread::oops_do(). But help me here: I believe ZGC processes this stuff concurrently, right? So there might be a window where the lock-stack oops would be unprocessed. The lock-stack would not go under the stack-watermark machinery. And if some code (like JVMTI deadlock detection pause) inspects the lockstack, it might see invalid oops? Is that a plausible scenario, or am I missing something? > > > > > @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? > > > > > > > > > > > > > > > > > > > > > > > > That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers _before_ the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? > > > > > > > > > > > > > > Oops that are processed in Thread::oops_do should not have load barriers. Other oops should have load barriers. > > > > > > Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() which is called from Thread::oops_do(). But help me here: I believe ZGC processes this stuff concurrently, right? So there might be a window where the lock-stack oops would be unprocessed. The lock-stack would not go under the stack-watermark machinery. And if some code (like JVMTI deadlock detection pause) inspects the lockstack, it might see invalid oops? Is that a plausible scenario, or am I missing something? > > The JVMTI deadlock detection runs in a safepoint, doesn't it? Safepoints call start_processing on all threads in safepoint cleanup for non-GC safepoints. That means the lock stack oops should have been processed when the deadlock detection logic runs in a safepoint. There appears to be a single code-path that inspects the lock-stack (and also the usual stack under non-fast-locking) outside of safepoints: V [libjvm.so+0x180abd4] Threads::owning_thread_from_monitor(ThreadsList*, ObjectMonitor*)+0x54 (threads.cpp:1433) V [libjvm.so+0x17a4bfc] ObjectSynchronizer::get_lock_owner(ThreadsList*, Handle)+0x9c (synchronizer.cpp:1109) V [libjvm.so+0x1802db0] ThreadSnapshot::initialize(ThreadsList*, JavaThread*)+0x270 (threadService.cpp:942) V [libjvm.so+0x1803244] ThreadDumpResult::add_thread_snapshot(JavaThread*)+0x5c (threadService.cpp:567) V [libjvm.so+0x12a0f64] jmm_GetThreadInfo+0x480 (management.cpp:1136) j sun.management.ThreadImpl.getThreadInfo1([JI[Ljava/lang/management/ThreadInfo;)V+0 java.management at 21-internal Curiously, this seems to be in JMX code, which is also roughly where the failure happens. I came across this code a couple of times and couldn't really tell if it is safe to do that outside of a safepoint. In doubt I have to assume it is not, and maybe this is the source of the failure? WDYT? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485692007 From eosterlund at openjdk.org Mon Mar 27 19:37:09 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 27 Mar 2023 19:37:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Mon, 27 Mar 2023 18:47:31 GMT, Roman Kennke wrote: > > > > > > @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That would be a question for @fisk and/or @stefank. AFAIK, the color bits should be masked by ZGC barriers _before_ the oops enter the synchronization subsystem. But I kinda suspect that we are somehow triggering a ZGC bug here. Maybe we require barriers when reading oops from the lock-stack too? > > > > > > > > > > > > > > > > > > > > > > > > > > Oops that are processed in Thread::oops_do should not have load barriers. Other oops should have load barriers. > > > > > > > > > > > > Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() which is called from Thread::oops_do(). But help me here: I believe ZGC processes this stuff concurrently, right? So there might be a window where the lock-stack oops would be unprocessed. The lock-stack would not go under the stack-watermark machinery. And if some code (like JVMTI deadlock detection pause) inspects the lockstack, it might see invalid oops? Is that a plausible scenario, or am I missing something? > > > > > > The JVMTI deadlock detection runs in a safepoint, doesn't it? Safepoints call start_processing on all threads in safepoint cleanup for non-GC safepoints. That means the lock stack oops should have been processed when the deadlock detection logic runs in a safepoint. > > > > There appears to be a single code-path that inspects the lock-stack (and also the usual stack under non-fast-locking) outside of safepoints: > > > > V [libjvm.so+0x180abd4] Threads::owning_thread_from_monitor(ThreadsList*, ObjectMonitor*)+0x54 (threads.cpp:1433) > > V [libjvm.so+0x17a4bfc] ObjectSynchronizer::get_lock_owner(ThreadsList*, Handle)+0x9c (synchronizer.cpp:1109) > > V [libjvm.so+0x1802db0] ThreadSnapshot::initialize(ThreadsList*, JavaThread*)+0x270 (threadService.cpp:942) > > V [libjvm.so+0x1803244] ThreadDumpResult::add_thread_snapshot(JavaThread*)+0x5c (threadService.cpp:567) > > V [libjvm.so+0x12a0f64] jmm_GetThreadInfo+0x480 (management.cpp:1136) > > j sun.management.ThreadImpl.getThreadInfo1([JI[Ljava/lang/management/ThreadInfo;)V+0 java.management at 21-internal > > > > Curiously, this seems to be in JMX code, which is also roughly where the failure happens. I came across this code a couple of times and couldn't really tell if it is safe to do that outside of a safepoint. In doubt I have to assume it is not, and maybe this is the source of the failure? WDYT? Could be. When not running a handshake or safepoint, you need to call start_processing manually on the target thread, which will ensure the oops are fixed until the next safepoint poll. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485752396 From rkennke at openjdk.org Mon Mar 27 20:24:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 27 Mar 2023 20:24:14 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Ensure safepoint when processing lock-stack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/37f061b0..32fdda25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=28-29 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From lmesnik at openjdk.org Mon Mar 27 23:31:34 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 27 Mar 2023 23:31:34 GMT Subject: RFR: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC [v2] In-Reply-To: References: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> Message-ID: On Tue, 21 Mar 2023 22:38:18 GMT, Chris Plummer wrote: >> There are two GC related issues with this test that are being addressed. The test was limiting the heap size to 6m so if there is still a leak, it will be detected quickly. This proved to be too small of a size when using ZGC. For the most part changing the size to 7m fixed this issue. However, I was still seeing frequent issues with ZGC on macOS. This is explained by [JDK-8304449](https://bugs.openjdk.org/browse/JDK-8304449), which noticed (rarely) OOME on macos even when not using ZGC. From JDK-8304449: >> >> "macOS has a thread behavior that is not seen on linux and windows that is causing more memory usage, which sometimes leads to this unexpected OOME. The debuggee side of the test constantly creates threads that do little more than a short sleep. It has a counter of "live" threads, and won't let that go over 500. On the debugger side it is just tracking ThreadStartEvents and ThreadDeathEvents. It keep tracks of threads (ThreadReferences) for which a ThreadStartEvent had been received but a ThreadDeathEvent has not. On linux and windows the count of outstanding threads is generally in the 200-400 range, sometimes briefly going over 500. However, on macOS it is closer to 2400. This means a lot more ThreadReferences being tracked, which means more memory usage, so sometimes you see an OOME on macOS as a result. " >> >> The `threads` collection mainly existed just so its size could be used to log the number of outstanding ThreadDeathEvents. I got rid of the `threads` collection and instead am just tracking the number of ThreadStartEvents and ThreadDeathEvents, and computing the difference to get the number of outstanding ThreadDeathEvents. > > Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: > > get rid of some locals that are not needed Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13130#pullrequestreview-1360001225 From dlong at openjdk.org Tue Mar 28 00:20:59 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Mar 2023 00:20:59 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Ensure safepoint when processing lock-stack src/hotspot/cpu/aarch64/aarch64.ad line 3844: > 3842: > 3843: // Indicate success on completion. > 3844: __ cmp(oop, oop); // Force ZF=1 to indicate success. It looks like `fast_lock` already sets ZF=1 on success/fall-through. Why not document this as part of the interface, then this `cmp` will be redundant? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149902898 From dlong at openjdk.org Tue Mar 28 00:34:00 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Mar 2023 00:34:00 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Ensure safepoint when processing lock-stack src/hotspot/cpu/aarch64/aarch64.ad line 3953: > 3951: > 3952: // Indicate success on completion. > 3953: __ cmp(oop, oop); Redundant, based on current implementation of `fast_unlock`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149908732 From sspitsyn at openjdk.org Tue Mar 28 02:37:31 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Mar 2023 02:37:31 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> <0J5IbRtwgK7aOiXWfWBSVLo0RniWf6rRfvqmqA59j5A=.a7cf4ca2-a750-4281-9c78-285ca58e21da@github.com> Message-ID: On Fri, 24 Mar 2023 19:09:57 GMT, Chris Plummer wrote: >> It is for yielding. Do you think we need this with a bigger frequency? > > I guess the question then is why the need to yield. It just seems a bit odd that I thought the main point of this loop was to keep busy calling `breakpointCheck()`, and I don't see how doing a yield 1 out of every 1,000,000 iterations relates to that. Thank you for pointing to the `breakpointCheck()` as it is not needed at all, so I've removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1149962945 From dlong at openjdk.org Tue Mar 28 02:47:59 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Mar 2023 02:47:59 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Ensure safepoint when processing lock-stack src/hotspot/share/interpreter/interpreterRuntime.cpp line 740: > 738: if (!UseHeavyMonitors && UseFastLocking) { > 739: // This is a hack to get around the limitation of registers in x86_32. We really > 740: // send an oopDesc* instead of a BasicObjectLock*. I don't understand what this is referring to. Trying to avoid passing an extra argument? src/hotspot/share/interpreter/interpreterRuntime.cpp line 746: > 744: ObjectSynchronizer::enter(h_obj, nullptr, current); > 745: return; > 746: } Why not put this code in a new function declared as InterpreterRuntime::monitorenter(JavaThread* current, oop obj) and have the caller decide which one to call? src/hotspot/share/oops/oop.cpp line 123: > 121: // Header verification: the mark is typically non-zero. If we're > 122: // at a safepoint, it must not be zero. fast-locking does allow the > 123: // mark to be zero at a safepoint. Suggestion: // at a safepoint, it must not be zero, except when UseFastLocking is turned on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149918325 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149918510 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149967043 From dlong at openjdk.org Tue Mar 28 02:48:01 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Mar 2023 02:48:01 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <49yCA-Vx9caLf1KSVYnST3QsQ_kJZhny4KKt6kQnapQ=.66c7d6dd-a076-40d1-9dad-ebecf6805674@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> <49yCA-Vx9caLf1KSVYnST3QsQ_kJZhny4KKt6kQnapQ=.66c7d6dd-a076-40d1-9dad-ebecf6805674@github.com> Message-ID: On Fri, 24 Mar 2023 06:15:31 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/share/oops/oop.cpp line 126: > >> 124: // Outside of a safepoint, the header could be changing (for example, >> 125: // another thread could be inflating a lock on this object). >> 126: if (ignore_mark_word || UseFastLocking) { > > Not clear why UseFastLocking appears here instead of in the return expression - especially given the comment above. I have the same question. Based on the comment, I would expect `return !SafepointSynchronize::is_at_safepoint() || UseFastLocking`; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149966288 From sspitsyn at openjdk.org Tue Mar 28 02:51:34 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Mar 2023 02:51:34 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> <0J5IbRtwgK7aOiXWfWBSVLo0RniWf6rRfvqmqA59j5A=.a7cf4ca2-a750-4281-9c78-285ca58e21da@github.com> Message-ID: On Fri, 24 Mar 2023 19:13:39 GMT, Chris Plummer wrote: >> It is needed to balance enabling/disabling notifyJvmti mode with the ThreadStart/VirtualThreadStart events. >> Otherwise, many mode switches can be observed without any events which is not interesting. >> We need to allow virtual threads to execute a little bit after a mode switch. > > Shouldn't that be the caller's responsibility? Including a comment would be helpful. Okay, moved to the caller. In fact, I've refactored the test even more for safety purposes. Will describe it separately. >> Good question. We almost always do it in the JVMTI tests including `serviceability/jvmti/vthread` and `vmTestbase/nsk/jvmti` tests. Examples are 22 `serviceability/jvmti/vthread` tests. > > Are you saying it's not needed, but you included it to be consistent with other tests? I remember it was really needed some time ago but do not remember why. Removed it now. Then we have a technical debt to get rid of this in all JVMTI tests with the `-agentlib` option. >> I'm not sure if it is really needed. 60 virtual threads are started. >> Some of them are executed long enough before shutdown. >> We can just increase the number of threads if necessary. > > ok I've refactored the test, and it has impacted this fragment. Please, see my comment above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1149968287 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1149967613 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1149969217 From dholmes at openjdk.org Tue Mar 28 02:52:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Mar 2023 02:52:00 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: <2cIUWaQL9GilRFtckC9SpcJVet_0Rb8SmFS1tfe8AWE=.35713c3e-1f5d-45e5-8a3c-d732070d7b81@github.com> On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Ensure safepoint when processing lock-stack src/hotspot/share/runtime/threads.cpp line 1433: > 1431: > 1432: JavaThread* Threads::owning_thread_from_monitor(ThreadsList* t_list, ObjectMonitor* monitor) { > 1433: assert(SafepointSynchronize::is_at_safepoint(), "not safe outside of safepoint"); Shouldn't this be gated on UseFastLocking? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149969081 From dlong at openjdk.org Tue Mar 28 03:03:01 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Mar 2023 03:03:01 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Ensure safepoint when processing lock-stack src/hotspot/share/runtime/objectMonitor.inline.hpp line 36: > 34: #include "runtime/synchronizer.hpp" > 35: > 36: inline intptr_t ObjectMonitor::is_entered(JavaThread* current) const { Suggestion: inline bool ObjectMonitor::is_entered(JavaThread* current) const { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149974098 From dholmes at openjdk.org Tue Mar 28 03:03:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Mar 2023 03:03:00 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: <_b88qICP5AK4NgLpW0fXKgoa8ObWoZM1GvXKpoNMxlU=.85187d4a-968e-4283-b46d-4290ee3cc402@github.com> On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Ensure safepoint when processing lock-stack src/hotspot/share/runtime/lockStack.hpp line 43: > 41: // efficient addressing in generated code. > 42: int _offset; > 43: oop _base[CAPACITY]; Should we be using `OopHandle` here rather than raw oops? Would that not avoid issues with scanning the lock-stack only during safepoints? Another alternative for a STW safepoint would be to do a handshake with the target threads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149972911 From sspitsyn at openjdk.org Tue Mar 28 03:04:32 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Mar 2023 03:04:32 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: <2rgTBRBpumpvNwpss9mVY6nySwat95LU9fJ6y04hR-Y=.7f81090b-37fd-4fc2-9b9e-92d29a771481@github.com> On Fri, 24 Mar 2023 02:00:57 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1581: >> >>> 1579: return false; >>> 1580: } >>> 1581: if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { >> >> shouldn't be >> if (!JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { >> here? > > This is nice catch, thanks! Fixed it now and discovered some issues which have not seen before. It occurred that disabling notifyJvmti events when virtual threads are executed is unsafe. It is not easy to fix these issue and there is no real need to. So that I've refactored the test to do multiple testing cycles. Each testing cycle disables notifyJvmti events when there are no virtual threads executed then starts a number of threads, then enables notifyJvmti events and shuts down the virtual threads. The test also extended to post more JVMTI events: `VirtualThreadStart`, `VirtualThreadEnd`, `ThreadStart` and `ThreadEnd`. Also, I saw some intermittent crashes with double-deallocation of JvmtiThreadState's which belongs to vthreads. So that, I have extend the VM_InitNotifyJvmtiEventsMode to do more corrections. Will push my fixes after my mach5 test runs are finished. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1149975274 From fyang at openjdk.org Tue Mar 28 03:13:47 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 28 Mar 2023 03:13:47 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v14] In-Reply-To: References: Message-ID: <7-oX5-8j4W1x_mcyc4Hwcluqyb2_aBjNjSZLdJNT2DA=.71d06f76-e6c2-4f07-80d3-00ddc00101ab@github.com> On Mon, 27 Mar 2023 14:43:04 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV patch and aarch64 improvement Updated RISC-V part looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12778#pullrequestreview-1360137045 From dlong at openjdk.org Tue Mar 28 03:14:02 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Mar 2023 03:14:02 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Ensure safepoint when processing lock-stack src/hotspot/share/runtime/synchronizer.cpp line 506: > 504: return; > 505: } > 506: // Otherwise retry. Why is retry important for the new code but not the old code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149978809 From greggwon at cox.net Tue Mar 28 03:34:02 2023 From: greggwon at cox.net (Gregg Wonderly) Date: Mon, 27 Mar 2023 22:34:02 -0500 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> Message-ID: On Mar 27, 2023, at 11:30 AM, Andrew Dinn wrote: > If this is pushed in jdk21 then anyone currently developing or upgrading an app to target jdk21 will only have been able to test on jdk17-jdk20 where they will not encounter the issue. So, his nly leaves them a small window to detect that there will be a problem using agents in jdk21. When jdk21 arrives this may force them to delay deployment or they may even deploy unaware that the problem exists. > > If this is pushed in jdk22 instead of jdk21 then anyone who upgrades from jdk17 to jdk21 will not have a problem. Anyone working on an app for deployment on jdk25 will have the opportunity to test on 3 non_LTS releases which might manifest the potential agent problem before deployment. This is, again, where the reality that some Java users live in is very different than what seems to be known my many decision makers here. Most corporate users of Java don?t control when a particular version of Java is deployed into their environments. It keeps being proposed, that somehow users are deploying a specific version of Java and getting an appropriate version of an application they use, all at one moment in time. The supposition that Java is ?deployed? with a particular software system that uses it, is summarily false. Even Linux releases by Redhat for RHEL, Centos or even Fedora don?t let you pick any and every version of Java. Java applications, by the millions were written without needing any particular vision of Java, until a version broke something major like starting at Java2 (1.2) and then Java1.4 which need a dozen fixes and then Java 1.5 that broke huge numbers of desktop apps that did not that had not used volatile class values for so many things, including loop control values that kept loops from exiting. Then we had 1.9 that almost went out the door disabling every single Spring app in existence. And there are more and more things happening that just do not make much sense in the grand scheme of things. Overall Java is just not a safe place to take people for the first time. Many have had horrid problems and given up on Java. For Java1.2, Perforce invested huge time to try and create a new desktop app for their SCM system. They got to the point of almost releasing a beta to testers and then summarily threw it all away because they just could not make it work for all the things that got broke in 1.2. At my business, we have lots of each device applications where it would be a good thing to use, but because of the breakage and issues others have experience with Java over the years, their experiences cause them to just say no to anything Java. Java is just randomly upgraded on peoples desktops in their view. It gets replaced by the IT staff at most corps at unrelated moments that they start using a particular Java application. Those corps and their IT staff have little to no knowledge of what every Java application is let alone how it might be dependent on features that are being changed at each release of the JDK/JVM. The end result is that it is a surprise, always for this class of user, which version of Java will be available and which application will break this time. I still have lots of ?jar? file applications that I share with others and they just double click on those to run them. It?s that class of user that this Java upgrades happen with software updates/distributions process completely overlooks. The Java licensing was always about you could not use Java as the sole application platform on a computer. So, all kinds of ?free? desktop apps (and applets and Java Web Start) applications were created and used by literally thousands of users that are completely out of sight. I continue to see massive migration away from Java as the first choice for new applications amongst developers I talk to. It?s not being taught to most new developers I meet. They hardly even know that Java exists or what it?s capable of. Most developers seem to be taught web front end development tools or .net or golang or other languages besides Java for backend dev. There seems to be little chance that Java will have a place in the landscape within the next 5 years or so as those who have used Java since the mid 1990s age out of the pool of active developers and are no longer influencing tech used. What a sad tale the last 10 years of Java has been compared to what was possible 25 years ago, and should of happened? Gregg Wonderly From duke at openjdk.org Tue Mar 28 04:24:39 2023 From: duke at openjdk.org (Eirik Bjorsnos) Date: Tue, 28 Mar 2023 04:24:39 GMT Subject: Integrated: 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 19:47:10 GMT, Eirik Bjorsnos wrote: > Please review this PR which replaces the use of outdated JVM flags for setting up debugging in the test value004.java > > This is part of an ongoing effort to remove use of the outdated flag '-Djava.compiler" such that the option itself can eventually be removed. This pull request has now been integrated. Changeset: 4f625c0b Author: Eirik Bjorsnos Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/4f625c0b9aed5ecd1d6f1dae824a007680fe1d8b Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8304543: Modernize debugging jvm args in test/hotspot/jtreg/vmTestbase/nsk/jdi/Argument/value/value004.java Reviewed-by: dholmes, cjplummer, alanb ------------- PR: https://git.openjdk.org/jdk/pull/13107 From dholmes at openjdk.org Tue Mar 28 04:27:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Mar 2023 04:27:35 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v4] In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 10:35:36 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Remove misc lines > - Disable caching in ASGCT Changes requested by dholmes (Reviewer). src/hotspot/share/runtime/thread.hpp line 641: > 639: void set_in_asgct(bool value) { _in_asgct = value; } > 640: static bool current_in_asgct() { > 641: Thread *cur = Thread::current(); You need to use`current_or_null_safe` here as you may be in a signal handling context. src/hotspot/share/runtime/thread.hpp line 651: > 649: public: > 650: ThreadInAsgct(Thread* thread) : _thread(thread) { > 651: assert(thread != NULL, "invariant"); s/NULL/nullptr/ ------------- PR Review: https://git.openjdk.org/jdk/pull/13144#pullrequestreview-1360181938 PR Review Comment: https://git.openjdk.org/jdk/pull/13144#discussion_r1150010290 PR Review Comment: https://git.openjdk.org/jdk/pull/13144#discussion_r1150010653 From dholmes at openjdk.org Tue Mar 28 04:30:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Mar 2023 04:30:29 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Mon, 27 Mar 2023 18:02:05 GMT, Leonid Mesnik wrote: >> In that case did you mean: >> >> '@run driver Frames2Test -Xss4M Frames2Targ' >> >> ? > > the @run line contains only the test name and additional command-line options if needed, the target app class 'Frames2Targ' is not included. I have copy-pasted this example from > https://github.com/openjdk/jdk/blob/0deb648985b018653ccdaf193dc13b3cf21c088a/test/jdk/com/sun/jdi/Frames2Test.java#L34 That is very confusing because the `Frames2Targ` seems to appear out of nowhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13170#discussion_r1150012428 From dholmes at openjdk.org Tue Mar 28 04:37:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Mar 2023 04:37:28 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: <4-FeuUY1NLDmt-xWkgjannEAjXlpkbKkr5OVQnm3eAE=.11e22d68-a321-4464-b8be-435dba3cd008@github.com> References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> <4-FeuUY1NLDmt-xWkgjannEAjXlpkbKkr5OVQnm3eAE=.11e22d68-a321-4464-b8be-435dba3cd008@github.com> Message-ID: On Mon, 27 Mar 2023 18:25:52 GMT, Leonid Mesnik wrote: > > Isn't the `-Xss4M` supposed to be passed as `-J-Xss4M`? That is what the original logic is expecting. > > I haven't seen that -J-.. is used here in the args, all tests just use vm flags directly. I am not sure if updating tests is the correct option here. Well whomever wrote the original parsing code expected that VM args would be passed using -J AFAICS. And if handling of non -J VM args is currently broken (per this PR) then how are those other tests working? Also note that if a test tries to set the classpath without using the `-J-classpath ` form then your code will not work correctly as the `` would be treated as the class name. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1486201874 From dholmes at openjdk.org Tue Mar 28 04:47:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Mar 2023 04:47:33 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Fri, 24 Mar 2023 06:31:14 GMT, Leonid Mesnik wrote: >> The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added comments and trim arguments Also note that both the old code and the new code won't correctly deal with other VM options that take separate arguments like classpath does e.g `--add-exports ` ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1486207223 From lmesnik at openjdk.org Tue Mar 28 04:47:36 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 Mar 2023 04:47:36 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Tue, 28 Mar 2023 04:27:40 GMT, David Holmes wrote: >> the @run line contains only the test name and additional command-line options if needed, the target app class 'Frames2Targ' is not included. I have copy-pasted this example from >> https://github.com/openjdk/jdk/blob/0deb648985b018653ccdaf193dc13b3cf21c088a/test/jdk/com/sun/jdi/Frames2Test.java#L34 > > That is very confusing because the `Frames2Targ` seems to appear out of nowhere. Well, I could update the comment, but it is basically how the all jdi tests work. The test adds target by itself after all command line arguments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13170#discussion_r1150020120 From lmesnik at openjdk.org Tue Mar 28 05:02:31 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 Mar 2023 05:02:31 GMT Subject: RFR: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[]) [v2] In-Reply-To: References: <7PDwD7zFu8CYojj6LDm0qVMlVHKuJNmAzxKaWdjGuvY=.cf39e296-777b-4653-bc19-7276a7da28e8@github.com> Message-ID: On Fri, 24 Mar 2023 06:31:14 GMT, Leonid Mesnik wrote: >> The TestScaffold incorrectly parse options, it should insert wrapper class between VM options and applications classame. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added comments and trim arguments Yes, you are right. The ability to process options is very limited. Probably would be better to get rid of setting VM flags using driver test parameters completely. Let me check if it could be implemented. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13170#issuecomment-1486217781 From amitkumar at openjdk.org Tue Mar 28 08:16:55 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 28 Mar 2023 08:16:55 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v14] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 14:43:04 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV patch and aarch64 improvement Hi Matias, s390x port is almost complete. All builds are successful & tier1 test for fast debug are complete. For other builds, tests are in progress. Please don't integrate, Wait for us ? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/12778#issuecomment-1486414339 From adinn at redhat.com Tue Mar 28 09:13:17 2023 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 28 Mar 2023 10:13:17 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> Message-ID: Greeg, I won't respond point by point to your comments as I cannot see any great value in doing so. I really only want to make one general comment about your account below, which is that you appear to me to be relaying your own experience as a desktop Java user and universalising it to all users and uses of Java. While I acknowledge that you are correct to state that there can be problems with maintaining a consistent desktop setup for Java I'll counter that with two important qualifying arguments which undercut your account. Firstly, the problems that you describe are general ones that apply when it comes to software configuration management for a highly user-specific and often, in consequence, highly variable environment like a desktop. They are not problems that are specific to Java or other language runtimes. Even within that category many runtimes, especially language runtimes, suffer from problems with installed version mismatches and in many cases these have been notably far worse problems than Java (as for example the Python 2/3 fiasco mentioned by Andrew Haley). However, that is not to say that the problem lies with the runtime itself. Version management problems cannot simply be resolved by preserving Java (or any other deployed software) in aspic. At least some minimal level of upgrade of a runtime like Java is needed to deal with emerging security and critical functional problems. However, in the longer term any platform will also need to incorporate larger scale modifications in order to cater for the continuous, dramatic change that we have seen and continue to see in all hardware and operating systems. Java has not stood still over the last 25 years for very good reasons. Your reply suggests that you are unaware of the reality that those who manage non-desktop deployments plan very carefully around this need to adapt deployments to updates. Your lament that (your and others') desktop management does not include such provision may reflect the reality of some (but definitely not all) individuals or organizations. However, that lack of provision attests not to any failing on the part of the developers of Java but rather to a lack of organization, understanding and adequate preparation for *necessary maintenance* on the part of those responsible for managing said desktops. Which brings us to the second point: your complaint omits to allow for the enormous efforts that Java developers perform to enable Java users to rely on and profit from exactly the sort of continuity that you misguidedly claim Java does not provide. We are currently maintaining reliable, secure and bug free versions of jdk8, jdk11, jdk17 which allow users to continue to run applications that were originally deployed many years ago and will do so for many years to come. Indeed, as with many other large-scale, organized open source software infrastructure projects, this is the primary focus of the OpenJDK team. The number of people involved in maintaining legacy releases of Java to support existing users far outweighs those involved in developing new releases and new features. Users who put in the work needed to manage the configuration of their desktop environments can easily use these legacy releases to maintain their own desktop applications. It's not a free lunch -- admins of the desktop systems need to have at least a moderate understanding of how to configure their systems in order to maintain applications that rely on a specific Java platform release. However, to claim that the OpenJDK devs have not made this possible, worse to claim that Java has actually poisoned the well for desktop users, is a ridiculous and ignorant assertion. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill On 28/03/2023 04:34, Gregg Wonderly wrote: > On Mar 27, 2023, at 11:30 AM, Andrew Dinn wrote: >> If this is pushed in jdk21 then anyone currently developing or upgrading an app to target jdk21 will only have been able to test on jdk17-jdk20 where they will not encounter the issue. So, his nly leaves them a small window to detect that there will be a problem using agents in jdk21. When jdk21 arrives this may force them to delay deployment or they may even deploy unaware that the problem exists. >> >> If this is pushed in jdk22 instead of jdk21 then anyone who upgrades from jdk17 to jdk21 will not have a problem. Anyone working on an app for deployment on jdk25 will have the opportunity to test on 3 non_LTS releases which might manifest the potential agent problem before deployment. > > This is, again, where the reality that some Java users live in is very different than what seems to be known my many decision makers here. Most corporate users of Java don?t control when a particular version of Java is deployed into their environments. It keeps being proposed, that somehow users are deploying a specific version of Java and getting an appropriate version of an application they use, all at one moment in time. The supposition that Java is ?deployed? with a particular software system that uses it, is summarily false. Even Linux releases by Redhat for RHEL, Centos or even Fedora don?t let you pick any and every version of Java. Java applications, by the millions were written without needing any particular vision of Java, until a version broke something major like starting at Java2 (1.2) and then Java1.4 which need a dozen fixes and then Java 1.5 that broke huge numbers of desktop apps that did not that had not used volatile class values for so many things, including loop control values that kept loops from exiting. Then we had 1.9 that almost went out the door disabling every single Spring app in existence. And there are more and more things happening that just do not make much sense in the grand scheme of things. > > Overall Java is just not a safe place to take people for the first time. Many have had horrid problems and given up on Java. For Java1.2, Perforce invested huge time to try and create a new desktop app for their SCM system. They got to the point of almost releasing a beta to testers and then summarily threw it all away because they just could not make it work for all the things that got broke in 1.2. At my business, we have lots of each device applications where it would be a good thing to use, but because of the breakage and issues others have experience with Java over the years, their experiences cause them to just say no to anything Java. > > Java is just randomly upgraded on peoples desktops in their view. It gets replaced by the IT staff at most corps at unrelated moments that they start using a particular Java application. Those corps and their IT staff have little to no knowledge of what every Java application is let alone how it might be dependent on features that are being changed at each release of the JDK/JVM. > > The end result is that it is a surprise, always for this class of user, which version of Java will be available and which application will break this time. > > I still have lots of ?jar? file applications that I share with others and they just double click on those to run them. It?s that class of user that this Java upgrades happen with software updates/distributions process completely overlooks. The Java licensing was always about you could not use Java as the sole application platform on a computer. So, all kinds of ?free? desktop apps (and applets and Java Web Start) applications were created and used by literally thousands of users that are completely out of sight. > > I continue to see massive migration away from Java as the first choice for new applications amongst developers I talk to. It?s not being taught to most new developers I meet. They hardly even know that Java exists or what it?s capable of. Most developers seem to be taught web front end development tools or .net or golang or other languages besides Java for backend dev. > > There seems to be little chance that Java will have a place in the landscape within the next 5 years or so as those who have used Java since the mid 1990s age out of the pool of active developers and are no longer influencing tech used. > > What a sad tale the last 10 years of Java has been compared to what was possible 25 years ago, and should of happened? > > Gregg Wonderly > From rkennke at openjdk.org Tue Mar 28 10:28:00 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 10:28:00 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v31] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: (x86_32) Use existing thread register in fast_unlock() instead of fetching thread into a tmp register ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/32fdda25..d4da05d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=29-30 Stats: 8 lines in 2 files changed: 2 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 28 10:35:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 10:35:26 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v32] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use long branches at start of fast_lock() and fast_unlock() unconditionally ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/d4da05d9..7ccc4b63 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=30-31 Stats: 7 lines in 1 file changed: 0 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 28 10:35:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 10:35:29 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v32] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Wed, 22 Mar 2023 18:05:49 GMT, Vladimir Kozlov wrote: >> I don't think so, unless we also want to change all the stuff in x86_32.ad to not fetch the thread before calling into fast_unlock(). However, I think it is a nice and useful change. I could break it out of this PR and get it reviewed separately, it is a side-effect of the new locking impl insofar as we always require a thread register, and allocate&fetch it before going into fast_lock(). > > I missed that it is under #ifndef LP64. Yes, it make since since you are now passing `thread` in register. > And why we need to `get_thread()` at line 708 if we already have it? > > It is still hard to follow this 32-bit code. What each register is holding, what is value `3` and why we don't have checks similar to LP64 code after CAS? Good point about using the existing thread register. I am not sure why we use '3' in x86_32 instead of the 1 (unlocked_value) that we are using elsewhere. It probably doesn't matter as long as it is not 0 which indicates recursive locking. We don't do extra works after the CAS in x86_32 because x86_64 additionally optimizes the recursive case, while x86_32 drops into the slow-path there. It might be worth investigating if we could consolidate the two implementations and maybe even merge them into a single one. But perhaps not as part of the new locking effort? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150373315 From sspitsyn at openjdk.org Tue Mar 28 10:45:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Mar 2023 10:45:33 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v6] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: addressed the rest of review comments chunk ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/89b659d7..f8a7b18b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=04-05 Stats: 215 lines in 7 files changed: 153 ins; 29 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From rkennke at openjdk.org Tue Mar 28 10:55:16 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 10:55:16 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 06:55:55 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/share/runtime/threads.cpp line 1422: > >> 1420: } >> 1421: >> 1422: JavaThread* Threads::owning_thread_from_object(ThreadsList * t_list, oop obj) { > > Is this thread-safe? My last commit changed that code to only run during safepoints. It should be safe now, and I added an assert that verifies that it is only run at safepoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150401225 From alanb at openjdk.org Tue Mar 28 10:55:41 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 28 Mar 2023 10:55:41 GMT Subject: RFR: 8304919: Implementation of Virtual Threads Message-ID: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. In addition, there are a small number of implementation changes to sync up from the loom fibers branch: - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. - New system property to print a trace dump when virtual thread sets its own value of a TL. - ThreadPerTaskExecutor is changed to use FutureTask. Testing: tier1-6. ------------- Commit messages: - Merge - Initial sync from fibers branch Changes: https://git.openjdk.org/jdk/pull/13203/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304919 Stats: 1964 lines in 198 files changed: 698 ins; 760 del; 506 mod Patch: https://git.openjdk.org/jdk/pull/13203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13203/head:pull/13203 PR: https://git.openjdk.org/jdk/pull/13203 From rkennke at openjdk.org Tue Mar 28 11:03:35 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 11:03:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v33] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address some of @dholmes-ora' review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/7ccc4b63..de45c08e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=31-32 Stats: 54 lines in 5 files changed: 19 ins; 3 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Tue Mar 28 11:10:34 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Mar 2023 11:10:34 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v7] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor comment correction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/f8a7b18b..526c2788 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From ron.pressler at oracle.com Tue Mar 28 11:43:54 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Tue, 28 Mar 2023 11:43:54 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> Message-ID: <1961D75D-1A31-4B10-A84B-72A9ADB065A6@oracle.com> Applications can now control the Java version available to them (and this is something we?ll keep improving), and the JRE, the centralised Java environment installed on user machines, has not existed for some time. Strong encapsulation (which this change is part of) has made compatibility better than it?s ever been. It is true that we cannot now fix the compatibility issues between Java 1.2 and 1.5, but rather than the reality not being known to decision makers, it looks like you may be unfamiliar with the changes made by decision makers (a few years ago) to address some of the very issued you raised. ? Ron > On 28 Mar 2023, at 04:34, Gregg Wonderly wrote: > > On Mar 27, 2023, at 11:30 AM, Andrew Dinn wrote: >> If this is pushed in jdk21 then anyone currently developing or upgrading an app to target jdk21 will only have been able to test on jdk17-jdk20 where they will not encounter the issue. So, his nly leaves them a small window to detect that there will be a problem using agents in jdk21. When jdk21 arrives this may force them to delay deployment or they may even deploy unaware that the problem exists. >> >> If this is pushed in jdk22 instead of jdk21 then anyone who upgrades from jdk17 to jdk21 will not have a problem. Anyone working on an app for deployment on jdk25 will have the opportunity to test on 3 non_LTS releases which might manifest the potential agent problem before deployment. > > This is, again, where the reality that some Java users live in is very different than what seems to be known my many decision makers here. Most corporate users of Java don?t control when a particular version of Java is deployed into their environments. It keeps being proposed, that somehow users are deploying a specific version of Java and getting an appropriate version of an application they use, all at one moment in time. The supposition that Java is ?deployed? with a particular software system that uses it, is summarily false. Even Linux releases by Redhat for RHEL, Centos or even Fedora don?t let you pick any and every version of Java. Java applications, by the millions were written without needing any particular vision of Java, until a version broke something major like starting at Java2 (1.2) and then Java1.4 which need a dozen fixes and then Java 1.5 that broke huge numbers of desktop apps that did not that had not used volatile class values for so many things, including loop control values that kept loops from exiting. Then we had 1.9 that almost went out the door disabling every single Spring app in existence. And there are more and more things happening that just do not make much sense in the grand scheme of things. > > Overall Java is just not a safe place to take people for the first time. Many have had horrid problems and given up on Java. For Java1.2, Perforce invested huge time to try and create a new desktop app for their SCM system. They got to the point of almost releasing a beta to testers and then summarily threw it all away because they just could not make it work for all the things that got broke in 1.2. At my business, we have lots of each device applications where it would be a good thing to use, but because of the breakage and issues others have experience with Java over the years, their experiences cause them to just say no to anything Java. > > Java is just randomly upgraded on peoples desktops in their view. It gets replaced by the IT staff at most corps at unrelated moments that they start using a particular Java application. Those corps and their IT staff have little to no knowledge of what every Java application is let alone how it might be dependent on features that are being changed at each release of the JDK/JVM. > > The end result is that it is a surprise, always for this class of user, which version of Java will be available and which application will break this time. > > I still have lots of ?jar? file applications that I share with others and they just double click on those to run them. It?s that class of user that this Java upgrades happen with software updates/distributions process completely overlooks. The Java licensing was always about you could not use Java as the sole application platform on a computer. So, all kinds of ?free? desktop apps (and applets and Java Web Start) applications were created and used by literally thousands of users that are completely out of sight. > > I continue to see massive migration away from Java as the first choice for new applications amongst developers I talk to. It?s not being taught to most new developers I meet. They hardly even know that Java exists or what it?s capable of. Most developers seem to be taught web front end development tools or .net or golang or other languages besides Java for backend dev. > > There seems to be little chance that Java will have a place in the landscape within the next 5 years or so as those who have used Java since the mid 1990s age out of the pool of active developers and are no longer influencing tech used. > > What a sad tale the last 10 years of Java has been compared to what was possible 25 years ago, and should of happened? > > Gregg Wonderly From rkennke at openjdk.org Tue Mar 28 11:48:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 11:48:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 02:49:11 GMT, David Holmes wrote: > > Is anybody familiar with the academic literature on this topic? I am sure I am not the first person which has come up with this form of locking. Maybe we could use a name that refers to some academic paper? > > Well not to diminish this in any way but all you are doing is moving the lock-record from the stack frame (indexed from the markword) to a heap allocated side-table (indexed via the thread itself). The "fast-locking" is still the bit that use the markword to indicate the locked state, and that hasn't changed. Encoding lock state in an object header has a number of names in the literature, depending on whose scheme it was: IBM had ThinLocks; the Sun Research VM (EVM) had meta-locks; etc. Hotspot doesn't really have a name for its variation. And as I said you aren't changing that aspect but modifying what data structure is used to access the lock-records. > > So the property Jesper was looking for, IMO, may be something like `UseHeapLockRecords` - though that can unfortunately be parsed as using records for the HeapLock. :( > > I think it was mentioned somewhere above that in the Java Object Monitor prototyping work we avoided using these kinds of boolean flags by defining a single "policy" flag that could take on different values for different implementation schemes. These are simply numbered, so for example: > > * policy 0: use existing/legacy locking with stack-based lock records > * policy 1: use heavyweight locks (ie UseHeavyMonitors) > * policy 2 use the new approach with heap-allocated lock-records Well I would argue that the current implementation puts the lock record in the object header (in the form of a pointer into the stack, which is elsewhere used to identify which thread an object is locked by), and only displaces the object header onto the stack-frame, whereas the new implementation puts the lock record onto the lock-stack, which is still part of the JavaThread structure (and not heap-allocated ... although it used to be in an earlier incarnation). And it leaves the object header alone. So the correct name would be +UseStackLockRecord to turn on the new impl (where the old one would be called header lock record, or something like that). I like that name, but I suspect that it might be confusing because the old impl has traditionally been called 'stack-locking'. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1486716156 From rkennke at openjdk.org Tue Mar 28 13:08:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 13:08:10 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 10:47:11 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/cpu/aarch64/aarch64.ad line 3954: > >> 3952: // Indicate success on completion. >> 3953: __ cmp(oop, oop); >> 3954: __ b(count); > > `aarch64_enc_fast_lock` explicitly sets NE on failure path. But this code just jumps to `no_count` without setting the flag. Does the code outside this encoding block rely on flags? The code outside this encoding block relies on flags, yes. This is very ugly. fast_unlock() jumps to no_count when the CAS fails, where the NE flag is set, therefore we don't need to set it again. > src/hotspot/share/runtime/synchronizer.cpp line 923: > >> 921: static bool is_lock_owned(Thread* thread, oop obj) { >> 922: assert(UseFastLocking, "only call this with fast-locking enabled"); >> 923: return thread->is_Java_thread() ? reinterpret_cast(thread)->lock_stack().contains(obj) : false; > > Here and later, should use `JavaThread::cast(thread)` instead of `reinterpret_cast`? It also sometimes subsumes the asserts, as `JT::cast` checks the type. The problem is that the places where that helper function is used receive a Thread* and not a JavaThread* (FastHashCode() and inflate()), and changing those to accept JavaThread* percolates into various areas that I don't want to touch right now (e.g. finalizerService.cpp). That is the reason why that function exists to begin with. I'll do the changes that @shipilev suggested for the time being. We may want to investigate restricting the incoming type in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150580134 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150576745 From rkennke at openjdk.org Tue Mar 28 13:12:54 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 13:12:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 10:50:39 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/cpu/aarch64/aarch64.ad line 3848: > >> 3846: __ bind(slow); >> 3847: __ tst(oop, oop); // Force ZF=0 to indicate failure and take slow-path. We know that oop != null. >> 3848: __ b(no_count); > > Is this a micro-optimization? I think we can simplify the code by just setting the flags here and then jumping into the usual `__ b(cont)`. This would make the move of `__ b(cont)` unnecessary below. Well it avoids one conditional branch, so yeah I guess it's an optimization. If you don't like the b(cont) I can still move it back to where it was. It would be a dead instruction with fast-locking, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150585671 From rkennke at openjdk.org Tue Mar 28 13:29:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 13:29:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: <8XwHgfW44r-rx3JcHxbjjuEe9ykIc3pxcUtT9HjyjU0=.69d97531-26ad-449a-a6de-e344b2a96540@github.com> On Fri, 24 Mar 2023 11:17:05 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 611: > >> 609: bind(slow); >> 610: testptr(objReg, objReg); // force ZF=0 to indicate failure >> 611: jmp(NO_COUNT); > > We set a flag on failure (`NO_COUNT`) path. Should we set the flag on success (`COUNT`) path as well? The path at COUNT already sets the ZF, there is no need to do it here. NO_COUNT doesn't clear ZF, and fast_lock_impl() may branch to slow with ZF set (on the overflow check) so we need to explicitly clear ZF. However, I just came up with a better way to check for overflow that readily clears the ZF: subtract 1 from the end-offset and make a greater-comparison instead of the greaterEquals that we currently do on the end-offset. That should simplify the code and avoid a branch. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 926: > >> 924: mov(boxReg, tmpReg); >> 925: fast_unlock_impl(objReg, boxReg, tmpReg, NO_COUNT); >> 926: jmp(COUNT); > > Do we need to care about returning proper flags here? Yes we do, but fast_unlock_impl() already does the right thing on the failure path, and COUNT does the right thing on the success path. Yes, it is all very ugly. *shrugs* ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150609171 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150610895 From rkennke at openjdk.org Tue Mar 28 14:00:12 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 28 Mar 2023 14:00:12 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v34] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address @shipilev review comments (x86) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/de45c08e..798615f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=32-33 Stats: 28 lines in 4 files changed: 18 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From amitkumar at openjdk.org Tue Mar 28 14:38:31 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 28 Mar 2023 14:38:31 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v14] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 14:43:04 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV patch and aarch64 improvement `{tier1, tier2} X {fast debug, slow debug, release}` testing done for s390x. PR seems clean. @matias9927 please include port for s390x from this commit: https://github.com/offamitkumar/jdk/commit/a582f32f97aefba33cebaf4ace540681dfc0eff5 Thanks src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp line 652: > 650: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) > 651: __ sldi(size, size, log2i_exact(sizeof(ResolvedIndyEntry))); > 652: __ add(cache, cache, size); @reinrich Is there any specific reason, why you're not calling load_resolved_indy_entry() method here. On s390x build/changes are stable even with calling that helper method. ------------- Marked as reviewed by amitkumar (Author). PR Review: https://git.openjdk.org/jdk/pull/12778#pullrequestreview-1361010070 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1150566670 From matsaave at openjdk.org Tue Mar 28 15:00:14 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 28 Mar 2023 15:00:14 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v15] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: s390 update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/84ed272a..dad70dc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=13-14 Stats: 105 lines in 4 files changed: 83 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From greggwon at cox.net Tue Mar 28 15:35:41 2023 From: greggwon at cox.net (Gregg Wonderly) Date: Tue, 28 Mar 2023 10:35:41 -0500 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> Message-ID: <041E6D30-C453-4C6C-8562-BC6545E032B1@cox.net> > On Mar 28, 2023, at 4:13 AM, Andrew Dinn wrote: > > Greeg, > > I won't respond point by point to your comments as I cannot see any great value in doing so. I really only want to make one general comment about your account below, which is that you appear to me to be relaying your own experience as a desktop Java user and universalising it to all users and uses of Java. While I acknowledge that you are correct to state that there can be problems with maintaining a consistent desktop setup for Java I'll counter that with two important qualifying arguments which undercut your account. Again, the supposition is that somehow users of software systems are always surrounded by version planning and management. This is just not the case. People use software for its functionality not because the platform has certain features. The platform is what enables software systems to be constructed and used. When the platform changes it?s feature set or design and removes features or disables the default use of those features, that lack of backward compatibility, when unmanaged by the platform, creates havoc for users and software developers who are providing software systems in vertical markets or into communities of users who are not educated or knowledgable enough to understand what any of these words actually refer to. > Firstly, the problems that you describe are general ones that apply when it comes to software configuration management for a highly user-specific and often, in consequence, highly variable environment like a desktop. They are not problems that are specific to Java or other language runtimes. Even within that category many runtimes, especially language runtimes, suffer from problems with installed version mismatches and in many cases these have been notably far worse problems than Java (as for example the Python 2/3 fiasco mentioned by Andrew Haley). However, that is not to say that the problem lies with the runtime itself. Your supposition is again that somehow everyone has the benefit of having a guru or appointed party that manages all of their software and versioning needs. I used the example of corporate users to draw on that environments ?random? upgrade timelines. Users get new machines as something fails or get new versions of software based on ?bugs? or ?risks? or other drivers that are completely unrelated to software development timelines of third party suppliers of software systems. Suggesting that just because this has happened historically, that somehow it provides the relief to do it as well is the point I am trying to counter with my arguments here. > Version management problems cannot simply be resolved by preserving Java (or any other deployed software) in aspic. At least some minimal level of upgrade of a runtime like Java is needed to deal with emerging security and critical functional problems. However, in the longer term any platform will also need to incorporate larger scale modifications in order to cater for the continuous, dramatic change that we have seen and continue to see in all hardware and operating systems. Java has not stood still over the last 25 years for very good reasons. Removing features and changing how the platform handles features such that software deployment or software systems have to change is the problem. There are so many things that have been changed to no real advantage to developers. Java 9 changes for ?modularity? were really only targeted changes to try and manage people plugging into private APIs, because the public APIs were not providing needed features and it was difficult or impossible to use the platform without those changes. This speaks to the ineffective nature of the platform management overall. Historically, and I assert this over and over, because I was in the room where it happened, Sun was only interested in selling servers. Oracle is again only interested in selling server/backend solutions and support. The OpenJDK community is, in fact providing a great service to the community in maintaining JDK versions that avoid forcing the user into incompatible places. > Your reply suggests that you are unaware of the reality that those who manage non-desktop deployments plan very carefully around this need to adapt deployments to updates. Your lament that (your and others') desktop management does not include such provision may reflect the reality of some (but definitely not all) individuals or organizations. However, that lack of provision attests not to any failing on the part of the developers of Java but rather to a lack of organization, understanding and adequate preparation for *necessary maintenance* on the part of those responsible for managing said desktops. I?ve done backend development with lots of different languages for decades. I?ve been subjected to Windows versioning, Linux Versioning (Redhat ripping out support for SATA2/SAS controllers in RHEL/Centos 8, cutting the chain for many servers that could easily move to 8, but would require huge customization of the boot to load the driver into the RAM disk) and lots of other versioning things. But, for a long time, I was on top of what was happening with Java because I was in the middle of things like the Jini community and the desktop communities. But, it became so painful and unreconcilable if not impossible to deal with the way that Java was being managed, that I gave up. I have 100,000s of thousands of lines of Java software that I?ve created all over the spectrum. Yet, its impossible for me to continue to be in the Java communities because I can?t get users interested in Java apps because they are absolutely put off by the version management issues and trying to understand which version of the runtime will work and exactly how they can manage to keep that in place in their environment. > Which brings us to the second point: your complaint omits to allow for the enormous efforts that Java developers perform to enable Java users to rely on and profit from exactly the sort of continuity that you misguidedly claim Java does not provide. We are currently maintaining reliable, secure and bug free versions of jdk8, jdk11, jdk17 which allow users to continue to run applications that were originally deployed many years ago and will do so for many years to come. Indeed, as with many other large-scale, organized open source software infrastructure projects, this is the primary focus of the OpenJDK team. The number of people involved in maintaining legacy releases of Java to support existing users far outweighs those involved in developing new releases and new features. I apologize for not including the work that the OpenJDK community does do. It?s quite remarkable that there is that dedication. I think that it reflects the same level of commitment I had to Java for some time. It would be a great platform and a great place to do lots of software that would be portable. However, the lack of attention to versioning across the portability landscape and the general problems created by continued change in platform behaviors that are not backward compatible are what create a huge problem for Java?s use in communities that are not commercial software systems. > Users who put in the work needed to manage the configuration of their desktop environments can easily use these legacy releases to maintain their own desktop applications. It's not a free lunch -- admins of the desktop systems need to have at least a moderate understanding of how to configure their systems in order to maintain applications that rely on a specific Java platform release. However, to claim that the OpenJDK devs have not made this possible, worse to claim that Java has actually poisoned the well for desktop users, is a ridiculous and ignorant assertion. Sure these legacy systems are available. But Java is not really helping the user with incompatible version changes. You have the problem that Java version installs only happen when people know what to install. You have the problems with non-elevated-privilege users can?t install new Java versions etc. There are just so many ways that these types of changes break Java?s write-once-run-anywhere promise and continue to disable existing systems without any details about what actually happened to break things. Gregg Wonderly From ecki at zusammenkunft.net Tue Mar 28 15:55:48 2023 From: ecki at zusammenkunft.net (Bernd) Date: Tue, 28 Mar 2023 17:55:48 +0200 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <041E6D30-C453-4C6C-8562-BC6545E032B1@cox.net> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> , <041E6D30-C453-4C6C-8562-BC6545E032B1@cox.net> Message-ID: <398DAECC-8910-A74C-9F92-650F1CA2A17C@hxcore.ol> An HTML attachment was scrubbed... URL: From stuefe at openjdk.org Tue Mar 28 16:04:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 Mar 2023 16:04:40 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Fri, 24 Mar 2023 16:54:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/runtime/lockStack.cpp line 42: >> >>> 40: >>> 41: #ifndef PRODUCT >>> 42: void LockStack::validate(const char* msg) const { >> >> Would you also like to check there are no `nullptr` elements on stack here? > > Please also verify against over- and underflow, and better than just null checks check that every oop really is an oop. I added this to my code: > > assert((_offset <= end_offset()), "lockstack overflow: _offset %d end_offset %d", _offset, end_offset()); > assert((_offset >= start_offset()), "lockstack underflow: _offset %d end_offset %d", _offset, start_offset()); > int end = to_index(_offset); > for (int i = 0; i < end; i++) { > assert(oopDesc::is_oop(_base[i]), "index %i: not an oop (" PTR_FORMAT ")", i, p2i(_base[i])); > ... Just realized that my proposal of oop-checking does not work since during GC oop can be moved and will temporarily be invalid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150847182 From mike at plan99.net Tue Mar 28 16:11:15 2023 From: mike at plan99.net (Mike Hearn) Date: Tue, 28 Mar 2023 18:11:15 +0200 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> Message-ID: Hi Gregg, Distributing little apps as JARs indeed doesn't work well anymore out of the box, but it doesn't have to be the end of the line for them. I've spent a couple of years writing a tool designed explicitly to solve all these problems [1]. You give Conveyor your JARs (or a Maven/Gradle build), it'll create and upload self-updating packages for Windows, Mac and Linux that bundle a jlinked and minified JVM, fully signed and notarized, along with a download HTML page for end users to get a big green button. It'll even draw an icon for you. You can do this from any OS, you don't need Windows or macOS to ship for them. This approach has the major downside that unless your app is open source it's not free (we gotta make money somehow) BUT if you can put that to one side, it works better than the JAR era ever could: - No Java compatibility issues by design. - Not blocked by browsers/operating system security. - Apps can update more smoothly than Web Start ever allowed. - You can use OS specific integrations. - Clean uninstalls, native code handled better and so on. You might object that this is somehow more effort than just making a fat jar and sending it to people, but in practice it's not harder. You run it, out pops a bunch of files, you make them available to people, done. W.R.T. corporate deployment, note that Conveyor makes MSIX files which are Microsoft's official format and easily deployed across Windows networks. The difficulty with the send-a-JAR approach is that maintaining backwards compatibility at the level you want (Win32, web level) takes a massive level of spend, a large library of public programs which can be automatically regression tested against, and a commitment to never break anything even if it seriously disadvantages later developers, and even then things will still break despite best intentions. Decades ago this tradeoff made more sense because bandwidth and storage space were much tighter, but now it's harder to justify. That's why so few platforms do it anymore. [1] https://hydraulic.software/ From mdoerr at openjdk.org Tue Mar 28 16:36:10 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 Mar 2023 16:36:10 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v14] In-Reply-To: References: Message-ID: <7ntbBinhQ6xnrVi4hIs05Lqq9UVDL3Q1nKku2iaP5CA=.a9de7b8a-35cd-40f3-af3e-b3e31138d192@github.com> On Tue, 28 Mar 2023 12:54:41 GMT, Amit Kumar wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> RISCV patch and aarch64 improvement > > src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp line 652: > >> 650: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) >> 651: __ sldi(size, size, log2i_exact(sizeof(ResolvedIndyEntry))); >> 652: __ add(cache, cache, size); > > @reinrich Is there any specific reason, why you're not calling load_resolved_indy_entry() method here. On s390x build/changes are stable even with calling that helper method. It should work if we move the addition of `Array::base_offset_in_bytes()` into the other caller. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1150886679 From dlong at openjdk.org Tue Mar 28 16:47:59 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Mar 2023 16:47:59 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Tue, 28 Mar 2023 10:53:10 GMT, Roman Kennke wrote: >> src/hotspot/share/runtime/threads.cpp line 1422: >> >>> 1420: } >>> 1421: >>> 1422: JavaThread* Threads::owning_thread_from_object(ThreadsList * t_list, oop obj) { >> >> Is this thread-safe? > > My last commit changed that code to only run during safepoints. It should be safe now, and I added an assert that verifies that it is only run at safepoint. I see the assert in `owning_thread_from_monitor` but not `owning_thread_from_object`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150900864 From lucy at openjdk.org Tue Mar 28 16:52:41 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 28 Mar 2023 16:52:41 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v15] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 15:00:14 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC port was provided by @reinrich, RISCV was provided by @DingliZhang and @zifeihan, and S390x by @offamitkumar. >> >> This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > s390 update s390 changes look good. Thank you, Amit, for working hard to get this done. src/hotspot/cpu/s390/templateTable_s390.cpp line 2432: > 2430: > 2431: // The invokedynamic is unresolved iff method is NULL > 2432: __ z_clgij(method, 0, Assembler::bcondNotEqual, resolved); // method != 0, jump to resolved In the light of the ongoing effort to substitute all occurrences of NULL (and (void*)0) with nullptr, you may want to substitute 0 with (unsigned long)nullptr here. src/hotspot/cpu/s390/templateTable_s390.cpp line 2442: > 2440: __ z_lg(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); > 2441: #ifdef ASSERT > 2442: __ z_clgij(method, 0, Assembler::bcondNotEqual, resolved); // method != 0, jump to resolved Same as above. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12778#pullrequestreview-1361535849 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1150904328 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1150904861 From cjplummer at openjdk.org Tue Mar 28 17:39:41 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 28 Mar 2023 17:39:41 GMT Subject: RFR: 8298046: Fix hidden but significant trailing whitespace in properties files for serviceability code In-Reply-To: <5zDsJcRoc-qspV7yCf2m27PCmIn3R7YsUhXZ-PBXZiI=.93d0d47d-2eb9-4011-814c-6ab40f2d0c9b@github.com> References: <5zDsJcRoc-qspV7yCf2m27PCmIn3R7YsUhXZ-PBXZiI=.93d0d47d-2eb9-4011-814c-6ab40f2d0c9b@github.com> Message-ID: On Fri, 2 Dec 2022 16:42:57 GMT, Magnus Ihse Bursie wrote: > According to [the specification](https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/Properties.html#load(java.io.Reader)) trailing whitespaces in the values of properties files are (somewhat surprisingly) actually significant. > > We have multiple files in the JDK with trailing whitespaces in the values. For most of this files, this is likely incorrect and due to oversight, but in a few cases it might actually be intended (like "The value is: "). > > After a discussion in the PR for [JDK-8295729](https://bugs.openjdk.org/browse/JDK-8295729), the consensus was to replace valid trailing spaces with the corresponding unicode sequence, `\u0020`. (And of course remove non-wanted trailing spaces.) > > Doing so has a dual benefit: > > 1) It makes it clear to everyone reading the code that there is a trailing space and it is intended > > 2) It will allow us to remove all actual trailing space characters, and turn on the corresponding check in jcheck to keep the properties files, just like all other source code files, free of trailing spaces. > > Ultimately, the call of whether a trailing space is supposed to be there, or is a bug, lies with the respective component teams owning these files. Thus I have split up the set of properties files with trailing spaces in several groups, to match the JDK teams, and open a JBS issue for each of them. This issue is for code I believe belong with the serviceability team. What was the reason for not moving forward with this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/11490#issuecomment-1487340890 From sspitsyn at openjdk.org Tue Mar 28 18:57:23 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Mar 2023 18:57:23 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: fixed trailing spaces in two files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/526c2788..2c59c54b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=06-07 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Tue Mar 28 19:27:37 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Mar 2023 19:27:37 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: <_b1VTI8uk98CeJkZ3_xJXKgienCA1N278sspOWuXgSY=.fcecc846-c7c4-43c4-ab60-435bd3923842@github.com> On Tue, 28 Mar 2023 18:57:23 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fixed trailing spaces in two files Pushed a couple of updates which include: (1) Addressed review comments: - fixed minor issues in `libToggleNotifyJvmtiTest.cpp`: added volatile specifier and removed unneeded static variable - fixed the typo in `disable_virtual_threads_notify_jvmti` - added comments for two new functions in `jvmtiEnvBase` (enabling and disabling notifyJvmti events) - added comments to `ToggleNotifyJvmtiTest.java` (2) Refactored the `ToggleNotifyJvmtiTest.java` to run isolated sequential test cycles for the sake of `disable_virtual_threads_notify_jvmti` safety (no virtual threads are allowed to run when it is executed) (3) Extended `libToggleNotifyJvmtiTest.cpp` to request more JVMTI events: `VirtualThreadEnd`, `ThreadStart` and `ThreadEnd` (4) Other updates: - updated `JVM_VirtualThreadHideFrames` and `LibraryCallKit::inline_native_notify_jvmti_hide` to always (unconditionally) set the temporary VTMS transition bit to avoid asserts - renamed VMop: `InitNotifyJvmtiEventsMode` => `SetNotifyJvmtiEventsMode` - updated `VM_SetNotifyJvmtiEventsMode` to correct `jt->jvmti_thread_state()` and `jt->jvmti_vthread()` if necessary (needed to fix `JvmtiThreadState` double-deallocation issue) I hope that all review comments have been addressed now. Please, let me know if anything is missed. The mach5 tiers 1-6 tuns are good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1487477975 From alanb at openjdk.org Tue Mar 28 19:36:18 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 28 Mar 2023 19:36:18 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> > JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. > > There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. > > In addition, there are a small number of implementation changes to sync up from the loom fibers branch: > > - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. > - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. > - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. > - New system property to print a stack trace when a virtual thread sets its own value of a TL. > - ThreadPerTaskExecutor is changed to use FutureTask. > > Testing: tier1-6. Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - ThreadSleepEvent refactoring - Merge - Merge - Initial sync from fibers branch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13203/files - new: https://git.openjdk.org/jdk/pull/13203/files/1c62dc8a..7906dbb4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=00-01 Stats: 2112 lines in 47 files changed: 1050 ins; 822 del; 240 mod Patch: https://git.openjdk.org/jdk/pull/13203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13203/head:pull/13203 PR: https://git.openjdk.org/jdk/pull/13203 From matsaave at openjdk.org Tue Mar 28 19:50:36 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 28 Mar 2023 19:50:36 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v16] In-Reply-To: References: Message-ID: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC port was provided by @reinrich, RISCV was provided by @DingliZhang and @zifeihan, and S390x by @offamitkumar. > > This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: s390x NULL to nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/dad70dc5..72ef0475 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=14-15 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Tue Mar 28 19:51:08 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 28 Mar 2023 19:51:08 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v15] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 15:00:14 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC port was provided by @reinrich, RISCV was provided by @DingliZhang and @zifeihan, and S390x by @offamitkumar. >> >> This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > s390 update Thank you to all the reviewers for the detailed feedback, corrections, and improvements! @theRealAph @dougxc @fparain @coleenp @RealFYang @RealLucy @turbanoff @TheRealMDoerr @calvinccheung Also, thank you very much for completing the ports @reinrich, @DingliZhang, @zifeihan, and @offamitkumar! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12778#issuecomment-1487506191 From matsaave at openjdk.org Tue Mar 28 19:53:59 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 28 Mar 2023 19:53:59 GMT Subject: Integrated: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Mon, 27 Feb 2023 21:37:34 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC port was provided by @reinrich, RISCV was provided by @DingliZhang and @zifeihan, and S390x by @offamitkumar. > > This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x This pull request has now been integrated. Changeset: 3fbbfd17 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/3fbbfd17491906d707f73fe6b0db2989363c303a Stats: 1516 lines in 58 files changed: 1109 ins; 173 del; 234 mod 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry Co-authored-by: Richard Reingruber Co-authored-by: Dingli Zhang Co-authored-by: Gui Cao Co-authored-by: Amit Kumar Reviewed-by: coleenp, dnsimon, fparain, gcao, aph, fyang, amitkumar, lucy ------------- PR: https://git.openjdk.org/jdk/pull/12778 From psandoz at openjdk.org Tue Mar 28 20:00:35 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 28 Mar 2023 20:00:35 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> Message-ID: On Tue, 28 Mar 2023 19:36:18 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - ThreadSleepEvent refactoring > - Merge > - Merge > - Initial sync from fibers branch src/java.base/share/classes/jdk/internal/javac/PreviewFeature.java line 72: > 70: RECORD_PATTERNS, > 71: // not used > 72: VIRTUAL_THREADS, Can the enum constant also be removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151091740 From dcubed at openjdk.org Tue Mar 28 21:43:01 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 28 Mar 2023 21:43:01 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: <2cIUWaQL9GilRFtckC9SpcJVet_0Rb8SmFS1tfe8AWE=.35713c3e-1f5d-45e5-8a3c-d732070d7b81@github.com> References: <2cIUWaQL9GilRFtckC9SpcJVet_0Rb8SmFS1tfe8AWE=.35713c3e-1f5d-45e5-8a3c-d732070d7b81@github.com> Message-ID: On Tue, 28 Mar 2023 02:48:30 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensure safepoint when processing lock-stack > > src/hotspot/share/runtime/threads.cpp line 1433: > >> 1431: >> 1432: JavaThread* Threads::owning_thread_from_monitor(ThreadsList* t_list, ObjectMonitor* monitor) { >> 1433: assert(SafepointSynchronize::is_at_safepoint(), "not safe outside of safepoint"); > > Shouldn't this be gated on UseFastLocking? Hmmm.... `owning_thread_from_monitor()` is only called from `ObjectSynchronizer::get_lock_owner()` when `get_lock_owner()` knows that it has an ObjectMonitor in hand. I'm not at all sure that we can assert that `ObjectSynchronizer::get_lock_owner()` is only called from a safepoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1151177370 From lmesnik at openjdk.org Tue Mar 28 23:53:35 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 Mar 2023 23:53:35 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> Message-ID: On Tue, 28 Mar 2023 19:36:18 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - ThreadSleepEvent refactoring > - Merge > - Merge > - Initial sync from fibers branch Changes requested by lmesnik (Reviewer). src/java.base/share/classes/java/lang/System.java line 2566: > 2564: > 2565: public V executeOnCarrierThread(Callable task) throws Exception { > 2566: if (Thread.currentThread() instanceof VirtualThread vthread) { Any specific reason to don't use Thread.currentThread().isVirtual()? test/hotspot/jtreg/runtime/vthread/JNIMonitor/JNIMonitor.java line 31: > 29: * @summary Tests that JNI monitors work correctly with virtual threads > 30: * @library /test/lib > 31: * @compile JNIMonitor.java I think that test file is compiled implicitly. So this line could be just removed. The same is for all other similar tests. test/jdk/com/sun/jdi/TestScaffold.java line 980: > 978: > 979: if (wrapper.equals("Virtual")) { > 980: threadFactory = Thread.ofVirtual().factory(); Should be line 469: argInfo.targetVMArgs += "--enable-preview "; removed also? test/jdk/com/sun/management/ThreadMXBean/VirtualThreads.java line 143: > 141: long[] tids = new long[] { tid0, tid1 }; > 142: long[] cpuTimes = bean.getThreadCpuTime(tids); > 143: if (Thread.currentThread().isVirtual()) { How it worked before? test/jdk/java/lang/Thread/virtual/GetStackTrace.java line 30: > 28: * @requires vm.continuations > 29: * @modules java.base/java.lang:+open > 30: * @run testng/othervm -XX:+UnlockDiagnosticVMOptions -XX:+ShowHiddenFrames GetStackTrace shouldn't be main/othervm ? test/jdk/java/lang/Thread/virtual/VirtualThreadPinnedEventThrows.java line 29: > 27: * @modules java.base/jdk.internal.event > 28: * @compile/module=java.base jdk/internal/event/VirtualThreadPinnedEvent.java > 29: * @run junit VirtualThreadPinnedEventThrows Shouldn't be 'junit/othervm' used here to ensure that updated VirtualThreadPinnedEvent is used? I don't know these details of jtreg. test/jdk/jdk/incubator/concurrent/StructuredTaskScope/PreviewFeaturesNotEnabled.java line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. Not sure I understand what happens. The file is test/jdk/jdk/incubator/concurrent/StructuredTaskScope/PreviewFeaturesNotEnabled.java while it contains is public class VirtualThreadPinnedEvent. Copied by mistake? ------------- PR Review: https://git.openjdk.org/jdk/pull/13203#pullrequestreview-1361847681 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151259675 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151175072 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151168150 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151168861 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151112603 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151153758 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151119165 From prappo at openjdk.org Wed Mar 29 00:11:36 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Wed, 29 Mar 2023 00:11:36 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> Message-ID: On Tue, 28 Mar 2023 23:47:02 GMT, Leonid Mesnik wrote: >> Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - ThreadSleepEvent refactoring >> - Merge >> - Merge >> - Initial sync from fibers branch > > src/java.base/share/classes/java/lang/System.java line 2566: > >> 2564: >> 2565: public V executeOnCarrierThread(Callable task) throws Exception { >> 2566: if (Thread.currentThread() instanceof VirtualThread vthread) { > > Any specific reason to don't use Thread.currentThread().isVirtual()? To use the pattern variable to call `executeOnCarrierThread` on it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151270089 From lmesnik at openjdk.org Wed Mar 29 00:18:34 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 29 Mar 2023 00:18:34 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> Message-ID: <6cV6Ar01qN8gh-PiP5uhsR294ceiYKVv3k7CR5qllw4=.d428e6ad-4fb4-4d33-8705-672eef534f1d@github.com> On Wed, 29 Mar 2023 00:08:21 GMT, Pavel Rappo wrote: >> src/java.base/share/classes/java/lang/System.java line 2566: >> >>> 2564: >>> 2565: public V executeOnCarrierThread(Callable task) throws Exception { >>> 2566: if (Thread.currentThread() instanceof VirtualThread vthread) { >> >> Any specific reason to don't use Thread.currentThread().isVirtual()? > > To use the pattern variable to call `executeOnCarrierThread` on it? ough, missed the last words in the line, thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151273326 From greggwon at cox.net Wed Mar 29 00:29:59 2023 From: greggwon at cox.net (Gregg G Wonderly) Date: Tue, 28 Mar 2023 19:29:59 -0500 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: Message-ID: This is exactly my point! Why would any one want to do something like this? This level of workaround and specialized deployment is the kind of breakage that I am referring to. I just don?t understand how this kind of rigging and customization can even start to feel right. Gregg Wonderly > On Mar 28, 2023, at 11:11 AM, Mike Hearn wrote: > > ?Hi Gregg, > > Distributing little apps as JARs indeed doesn't work well anymore out > of the box, but it doesn't have to be the end of the line for them. > I've spent a couple of years writing a tool designed explicitly to > solve all these problems [1]. You give Conveyor your JARs (or a > Maven/Gradle build), it'll create and upload self-updating packages > for Windows, Mac and Linux that bundle a jlinked and minified JVM, > fully signed and notarized, along with a download HTML page for end > users to get a big green button. It'll even draw an icon for you. You > can do this from any OS, you don't need Windows or macOS to ship for > them. > > This approach has the major downside that unless your app is open > source it's not free (we gotta make money somehow) BUT if you can put > that to one side, it works better than the JAR era ever could: > > - No Java compatibility issues by design. > > - Not blocked by browsers/operating system security. > > - Apps can update more smoothly than Web Start ever allowed. > > - You can use OS specific integrations. > > - Clean uninstalls, native code handled better and so on. > > You might object that this is somehow more effort than just making a > fat jar and sending it to people, but in practice it's not harder. You > run it, out pops a bunch of files, you make them available to people, > done. > > W.R.T. corporate deployment, note that Conveyor makes MSIX files which > are Microsoft's official format and easily deployed across Windows > networks. > > The difficulty with the send-a-JAR approach is that maintaining > backwards compatibility at the level you want (Win32, web level) takes > a massive level of spend, a large library of public programs which can > be automatically regression tested against, and a commitment to never > break anything even if it seriously disadvantages later developers, > and even then things will still break despite best intentions. Decades > ago this tradeoff made more sense because bandwidth and storage space > were much tighter, but now it's harder to justify. That's why so few > platforms do it anymore. > > [1] https://hydraulic.software/ From lmesnik at openjdk.org Wed Mar 29 02:43:40 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 29 Mar 2023 02:43:40 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 01:16:55 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed trailing spaces in two files > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1547: > >> 1545: JvmtiThreadState* ct_state = java_lang_Thread::jvmti_thread_state(jt->threadObj()); >> 1546: JvmtiThreadState* vt_state = vt_oop != nullptr ? java_lang_Thread::jvmti_thread_state(vt_oop) : nullptr; >> 1547: bool virt = vt_oop != nullptr && vt_oop != ct_oop; > > You can move it inside if() in line 1554 Doesn't it makes a sense to use bool virt = vt_oop != nullptr && java_lang_VirtualThread::is_instance(vt_oop); it is used in a lot of places and looks more usual as vthread check. Might be later it would be possible to refactor this expression to some separate function. Just a proposal, only if you agree with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1151305059 From lmesnik at openjdk.org Wed Mar 29 02:43:38 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 29 Mar 2023 02:43:38 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 18:57:23 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fixed trailing spaces in two files Changes requested by lmesnik (Reviewer). src/hotspot/share/prims/jvmtiEnvBase.cpp line 1547: > 1545: JvmtiThreadState* ct_state = java_lang_Thread::jvmti_thread_state(jt->threadObj()); > 1546: JvmtiThreadState* vt_state = vt_oop != nullptr ? java_lang_Thread::jvmti_thread_state(vt_oop) : nullptr; > 1547: bool virt = vt_oop != nullptr && vt_oop != ct_oop; You can move it inside if() in line 1554 src/hotspot/share/prims/jvmtiEnvBase.cpp line 1554: > 1552: // Correct jt->jvmti_thread_state() and jt->jvmti_vthread() if necessary. > 1553: // It was not maintained while notifyJvmti was disabled. > 1554: if (jt_state != ct_state && jt_state != vt_state) { Is it possible that jt_state == ct_state while the virtual thread is executed or vice versa? Just because jvmtt_state is outdated. Shouldn't we always update (set to null) link/ jvmti_vthread if _enabled == true? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 46: > 44: RawMonitorLocker agent_locker(jvmti, jni, agent_lock); > 45: > 46: vthread_started_cnt++; Wouldn't it be better to use std::atomic instead RawMonitorLocker here to reduce sync time? ------------- PR Review: https://git.openjdk.org/jdk/pull/13133#pullrequestreview-1362120493 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1151298663 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1151334153 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1151329179 From sspitsyn at openjdk.org Wed Mar 29 05:03:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 Mar 2023 05:03:47 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: <2ExJNB5LA7wGZA1HMaMSYrPwWQbLvGvVlxO7kRWRH5E=.ae9b1826-1527-4533-879f-fc12f796d775@github.com> On Wed, 29 Mar 2023 01:32:41 GMT, Leonid Mesnik wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1547: >> >>> 1545: JvmtiThreadState* ct_state = java_lang_Thread::jvmti_thread_state(jt->threadObj()); >>> 1546: JvmtiThreadState* vt_state = vt_oop != nullptr ? java_lang_Thread::jvmti_thread_state(vt_oop) : nullptr; >>> 1547: bool virt = vt_oop != nullptr && vt_oop != ct_oop; >> >> You can move it inside if() in line 1554 > > Doesn't it makes a sense to use > bool virt = vt_oop != nullptr && java_lang_VirtualThread::is_instance(vt_oop); > it is used in a lot of places and looks more usual as vthread check. > Might be later it would be possible to refactor this expression to some separate function. > Just a proposal, only if you agree with it. > You can move it inside if() in line 1554 > Doesn't it makes a sense to use > bool virt = vt_oop != nullptr && java_lang_VirtualThread::is_instance(vt_oop); The local `virt` is also used at line 1555. Second suggestion is good, thanks. I had to use it in the first place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1151395915 From sspitsyn at openjdk.org Wed Mar 29 05:30:38 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 Mar 2023 05:30:38 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 02:29:16 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed trailing spaces in two files > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/libToggleNotifyJvmtiTest.cpp line 46: > >> 44: RawMonitorLocker agent_locker(jvmti, jni, agent_lock); >> 45: >> 46: vthread_started_cnt++; > > Wouldn't it be better to use std::atomic instead RawMonitorLocker here to reduce sync time? This RawMonitorLocker is not on a critical path. It can be useful to sync print statements when tracing is needed. I feel that in order to get a full advantage of it we need to do this for many tests in our test base. I can convert it to std::atomic if you think it is important. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1151409885 From rrich at openjdk.org Wed Mar 29 07:27:26 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 29 Mar 2023 07:27:26 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v14] In-Reply-To: <7ntbBinhQ6xnrVi4hIs05Lqq9UVDL3Q1nKku2iaP5CA=.a9de7b8a-35cd-40f3-af3e-b3e31138d192@github.com> References: <7ntbBinhQ6xnrVi4hIs05Lqq9UVDL3Q1nKku2iaP5CA=.a9de7b8a-35cd-40f3-af3e-b3e31138d192@github.com> Message-ID: On Tue, 28 Mar 2023 16:33:09 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp line 652: >> >>> 650: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) >>> 651: __ sldi(size, size, log2i_exact(sizeof(ResolvedIndyEntry))); >>> 652: __ add(cache, cache, size); >> >> @reinrich Is there any specific reason, why you're not calling load_resolved_indy_entry() method here. On s390x build/changes are stable even with calling that helper method. > > It should work if we move the addition of `Array::base_offset_in_bytes()` into the other caller. > @reinrich Is there any specific reason, why you're not calling load_resolved_indy_entry() method here. On s390x build/changes are stable even with calling that helper method. Looks like this was changed on x86_64 after I ported it to ppc. Thanks for making me aware of it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1151498444 From alanb at openjdk.org Wed Mar 29 07:37:09 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 29 Mar 2023 07:37:09 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v3] In-Reply-To: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: > JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. > > There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. > > In addition, there are a small number of implementation changes to sync up from the loom fibers branch: > > - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. > - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. > - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. > - New system property to print a stack trace when a virtual thread sets its own value of a TL. > - ThreadPerTaskExecutor is changed to use FutureTask. > > Testing: tier1-6. Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: Test updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13203/files - new: https://git.openjdk.org/jdk/pull/13203/files/7906dbb4..8170e463 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=01-02 Stats: 67 lines in 3 files changed: 0 ins; 64 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13203/head:pull/13203 PR: https://git.openjdk.org/jdk/pull/13203 From alanb at openjdk.org Wed Mar 29 07:37:14 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 29 Mar 2023 07:37:14 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> Message-ID: On Tue, 28 Mar 2023 19:57:12 GMT, Paul Sandoz wrote: >> Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - ThreadSleepEvent refactoring >> - Merge >> - Merge >> - Initial sync from fibers branch > > src/java.base/share/classes/jdk/internal/javac/PreviewFeature.java line 72: > >> 70: RECORD_PATTERNS, >> 71: // not used >> 72: VIRTUAL_THREADS, > > Can the enum constant also be removed? Unfortunately not due to the bootstrapping issues with the build. It was the same for sealed classes where the constant couldn't be removed until the end of the release. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151497635 From alanb at openjdk.org Wed Mar 29 07:37:21 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 29 Mar 2023 07:37:21 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> Message-ID: On Tue, 28 Mar 2023 21:36:04 GMT, Leonid Mesnik wrote: >> Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - ThreadSleepEvent refactoring >> - Merge >> - Merge >> - Initial sync from fibers branch > > test/hotspot/jtreg/runtime/vthread/JNIMonitor/JNIMonitor.java line 31: > >> 29: * @summary Tests that JNI monitors work correctly with virtual threads >> 30: * @library /test/lib >> 31: * @compile JNIMonitor.java > > I think that test file is compiled implicitly. So this line could be just removed. > The same is for all other similar tests. Most of the test changes were just a few sed commands remove lines with `@enablePreview`, remove `--enable-preview -source ${jdk.version}` from `@compile` tags. We can remove the `@compile` tags that don't specify any other options as they aren't needed now, up to you. > test/jdk/com/sun/management/ThreadMXBean/VirtualThreads.java line 143: > >> 141: long[] tids = new long[] { tid0, tid1 }; >> 142: long[] cpuTimes = bean.getThreadCpuTime(tids); >> 143: if (Thread.currentThread().isVirtual()) { > > How it worked before? tid0 is the thread ID of a platform therad. tid1 is the threadID of a virtual thread. The only change here is allow this test run with the main wrapper plugin ([CODETOOLS-7903373](https://bugs.openjdk.org/browse/CODETOOLS-7903373)), it would otherwise have to be excluded for those runs. > test/jdk/java/lang/Thread/virtual/GetStackTrace.java line 30: > >> 28: * @requires vm.continuations >> 29: * @modules java.base/java.lang:+open >> 30: * @run testng/othervm -XX:+UnlockDiagnosticVMOptions -XX:+ShowHiddenFrames GetStackTrace > > shouldn't be main/othervm ? You're right, this came over from the loom repo and I didn't know that it wasn't running (no @Test). I think it would be better to run this test without ShowHiddenFrames, it's just need to known the expected bottom most frame. > test/jdk/java/lang/Thread/virtual/VirtualThreadPinnedEventThrows.java line 29: > >> 27: * @modules java.base/jdk.internal.event >> 28: * @compile/module=java.base jdk/internal/event/VirtualThreadPinnedEvent.java >> 29: * @run junit VirtualThreadPinnedEventThrows > > Shouldn't be 'junit/othervm' used here to ensure that updated VirtualThreadPinnedEvent is used? I don't know these details of jtreg. This is using the jtreg support for patching system modules. jtreg runs the test in a new VM with --patch-module. > test/jdk/jdk/incubator/concurrent/StructuredTaskScope/PreviewFeaturesNotEnabled.java line 2: > >> 1: /* >> 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > > Not sure I understand what happens. The file is > test/jdk/jdk/incubator/concurrent/StructuredTaskScope/PreviewFeaturesNotEnabled.java > while it contains is public class VirtualThreadPinnedEvent. Copied by mistake? Thank, I'm not sure what happened there as this test is deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151507288 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151502964 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151500272 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151495733 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151496227 From aturbanov at openjdk.org Wed Mar 29 07:37:21 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 29 Mar 2023 07:37:21 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v3] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 07:31:40 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Test updates test/hotspot/jtreg/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.java line 40: > 38: public static void main(String[] args) throws Exception { > 39: > 40: ProcessBuilder pb = ProcessTools.createTestJvm("-javaagent:agent.jar", "-version"); Suggestion: ProcessBuilder pb = ProcessTools.createTestJvm("-javaagent:agent.jar", "-version"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151504517 From alanb at openjdk.org Wed Mar 29 08:00:36 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 29 Mar 2023 08:00:36 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: > JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. > > There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. > > In addition, there are a small number of implementation changes to sync up from the loom fibers branch: > > - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. > - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. > - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. > - New system property to print a stack trace when a virtual thread sets its own value of a TL. > - ThreadPerTaskExecutor is changed to use FutureTask. > > Testing: tier1-6. Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: Fix ThreadSleepEvent again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13203/files - new: https://git.openjdk.org/jdk/pull/13203/files/8170e463..bfd2c816 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=02-03 Stats: 8 lines in 1 file changed: 4 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13203/head:pull/13203 PR: https://git.openjdk.org/jdk/pull/13203 From alanb at openjdk.org Wed Mar 29 08:05:59 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 29 Mar 2023 08:05:59 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v3] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 07:29:06 GMT, Andrey Turbanov wrote: >> Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: >> >> Test updates > > test/hotspot/jtreg/serviceability/jvmti/vthread/premain/AgentWithVThreadTest.java line 40: > >> 38: public static void main(String[] args) throws Exception { >> 39: >> 40: ProcessBuilder pb = ProcessTools.createTestJvm("-javaagent:agent.jar", "-version"); > > Suggestion: > > ProcessBuilder pb = ProcessTools.createTestJvm("-javaagent:agent.jar", "-version"); This is noise in the original test, the only change here is to drop the "--enable-preview" from the command. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1151542296 From mike at plan99.net Wed Mar 29 08:12:44 2023 From: mike at plan99.net (Mike Hearn) Date: Wed, 29 Mar 2023 10:12:44 +0200 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: Message-ID: Why, well, you get more features, it's easier for the end user, and not any harder for the developer. Those are pretty concrete reasons why people would want to do it that way. I'd suggest trying Conveyor out yourself before worrying about rigging or customization, because straightforward Java apps don't actually need any configuration beyond specifying the URL of the update site. It's easier than Web Start was, and takes no more lines of code than configuring Gradle to make fat JARs does. It's free for open source projects so I'd say please just try it out with an open mind. If you still find it harder than publishing JARs then I'd be very interested to read an essay or blog post drilling into the differences. Maybe try following the JavaFX tutorial here: https://conveyor.hydraulic.dev/7.2/tutorial/hare/jvm/ Without that kind of concrete detail though, I'm going to feel like this is about perspectives. If you view Java as a capital-p Platform, competing on the same level as an operating system, then JARs feel natural and bundling would feel like a retreat from the glory days. If you perceive it as a large and fancy library then it's just like any other library and asking the user to manage it separately makes no more sense than expecting users to manage the Visual C++ runtime. The switch to jlinking and bundling in this case is all win, it removes headaches instead of adding them. From sspitsyn at openjdk.org Wed Mar 29 10:21:39 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 Mar 2023 10:21:39 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 02:40:09 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed trailing spaces in two files > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1554: > >> 1552: // Correct jt->jvmti_thread_state() and jt->jvmti_vthread() if necessary. >> 1553: // It was not maintained while notifyJvmti was disabled. >> 1554: if (jt_state != ct_state && jt_state != vt_state) { > > Is it possible that jt_state == ct_state while the virtual thread is executed or vice versa? Just because jvmtt_state is outdated. > Shouldn't we always update (set to null) link/ jvmti_vthread if _enabled == true? A1: Not sure, I understand your first question correctly. What does mean "vice versa" in this context? When `notifyJvmti` events is disabled then a call to `rebind_to_jvmti_thread_state_of` is omitted in VTMS transitions. So, we need to correct it if necessary. It can be `jt_state == ct_state` while the virtual thread is executed in a mount/unmount transition. I keep thinking on how to make this fixup more precise. A better approach would be something like this: if (virt) { if (jt_state != vt_state) { jt->set_jvmti_thread_state(vt_state); // restore jt->jvmti_thread_state() jt->set_jvmti_vthread(vt_oop); // restore jt->jvmti_vthread() if (vt_state != nullptr) { vt_state->set_thread(jt); // restore JavaThread link } } } else { // !virt if (jt_state != ct_state) { jt->set_jvmti_thread_state(ct_state); // restore jt->jvmti_thread_state() jt->set_jvmti_vthread(nullptr); // reset jt->jvmti_vthread() } } But it does not work correctly now. Some adjustment is needed to make it working. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1151712229 From rkennke at openjdk.org Wed Mar 29 11:10:59 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 29 Mar 2023 11:10:59 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v34] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 14:00:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @shipilev review comments (x86) How about we call it 'lightweight locking' ? Stack-locking is also lightweight, but we already call it 'stack-locking', and the new locking impl is perhaps a little more lightweight insofar that it doesn't touch the object header much, and is a bit simpler in the implementation (we could drop all of BasicLock and related code, for example). ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1488394648 From ron.pressler at oracle.com Wed Mar 29 12:14:13 2023 From: ron.pressler at oracle.com (Ron Pressler) Date: Wed, 29 Mar 2023 12:14:13 +0000 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: Message-ID: > On 29 Mar 2023, at 01:29, Gregg G Wonderly wrote: > > This is exactly my point! Why would any one want to do something like this? This level of workaround and specialized deployment is the kind of breakage that I am referring to. I just don?t understand how this kind of rigging and customization can even start to feel right. > > Gregg Wonderly But you do understand, because you yourself have pointed out the problems caused by the old approach where the runtime and the application were provided separately. The current approach is a result of the JDK evolving to address those very problems, and it?s working. Embedded custom runtimes and strong encapsulation have greatly alleviated most of them. The alternative, a runtime that never changes, is only workable for very limited applications. The old approach was guided by one primary use case, Applets, which had very limited capabilities. Indeed, similarly restricted JavaScript applications are delivered separately from their runtime, the web browser, but the more capable desktop applications written in JavaScript are delivered with an embedded runtime, for similar reasons. In other languages, including ?native? ones, the trend is to similarly statically link dependencies ? including even libc ? or to bundle them in a container. These aren?t workarounds, but the best means we have to date to deliver applications that are portable, capable, and evolvable, whether it feels right or not. Perhaps someday another approach will present itself. ? Ron From rkennke at openjdk.org Wed Mar 29 13:51:51 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 29 Mar 2023 13:51:51 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v35] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Process lock-stack oops before inspecting them, when in foreign thread and not at safepoint. Add verifications. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/798615f9..aa24debd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=33-34 Stats: 108 lines in 11 files changed: 57 ins; 9 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Wed Mar 29 14:15:34 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 29 Mar 2023 14:15:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v36] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Relax verification in oops_do(), put back UseFastLocking in management.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/aa24debd..885eb613 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=34-35 Stats: 8 lines in 4 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From adinn at redhat.com Wed Mar 29 14:49:05 2023 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 29 Mar 2023 15:49:05 +0100 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: <041E6D30-C453-4C6C-8562-BC6545E032B1@cox.net> References: <5840A302-AD72-4308-A064-CB89868784C1@oracle.com> <023F29ED-CAF3-4D32-B36C-8053DDCC580A@oracle.com> <3e96a5df-c8b0-3574-a98b-33668391f3f0@redhat.com> <041E6D30-C453-4C6C-8562-BC6545E032B1@cox.net> Message-ID: <7949681a-2f89-0964-1bff-67238389aaf0@redhat.com> Hi Gregg, Thanks for your reply. I only have one small point to make. On 28/03/2023 16:35, Gregg Wonderly wrote: > Again, the supposition is that somehow users of software systems are always surrounded by version planning and management. . . . No, actually, my supposition is that users of software systems *ought* to have measures in place to handle version planning and management. The fact that many users do not, especially desktop users, merely reflects a widespread misunderstanding of how complex a thing it is to run a computer. The majority of car drivers, not being mechanically minded and skilled, invariably expect (even if they do not welcome) the regular expense required to keep their vehicles on the road. What is more they can pick and choose from a large number of service garages who will perform the necessary maintenance. Sadly, this analogy with running a computer breaks down not at the point where expert remedy is needed but where it is sought and paid for. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From rkennke at openjdk.org Wed Mar 29 15:06:02 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 29 Mar 2023 15:06:02 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Tue, 28 Mar 2023 16:00:34 GMT, Thomas Stuefe wrote: >> Please also verify against over- and underflow, and better than just null checks check that every oop really is an oop. I added this to my code: >> >> assert((_offset <= end_offset()), "lockstack overflow: _offset %d end_offset %d", _offset, end_offset()); >> assert((_offset >= start_offset()), "lockstack underflow: _offset %d end_offset %d", _offset, start_offset()); >> int end = to_index(_offset); >> for (int i = 0; i < end; i++) { >> assert(oopDesc::is_oop(_base[i]), "index %i: not an oop (" PTR_FORMAT ")", i, p2i(_base[i])); >> ... > > Just realized that my proposal of oop-checking does not work since during GC oop can be moved and will temporarily be invalid. Checking for is_oop() may not work, because oops may be temporarily invalid, until the GC gets to fix them. This is especially (and probably only) true around oops_do(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1152089622 From larry.cable at oracle.com Wed Mar 29 15:39:07 2023 From: larry.cable at oracle.com (Laurence Cable) Date: Wed, 29 Mar 2023 08:39:07 -0700 Subject: [External] : Re: Disallowing the dynamic loading of agents by default In-Reply-To: References: Message-ID: <2ad7c4c0-7ca5-915a-3f70-a68e90b9c436@oracle.com> I think while this discussion is an interesting one, and clearly one the elicits strong opinions; I believe that the focus on the proposed change on dynamic loading of agents has truly been lost therein, further exchanges have little relevance to the topic of Java serviceability and the issue at hand. I would encourage everyone to return their attention to the initial discussion topic of agent loading defaults. Rgds Larry Cable, Architect, Java Platform Group From rkennke at openjdk.org Wed Mar 29 16:05:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 29 Mar 2023 16:05:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v37] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Bounds check in lock-stack verification; only do watermark if we have one ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/885eb613..62298e49 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=35-36 Stats: 13 lines in 3 files changed: 12 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Wed Mar 29 17:38:31 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 Mar 2023 17:38:31 GMT Subject: RFR: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC [v2] In-Reply-To: References: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> Message-ID: <6I85YvoahDNz0nMi-q_Y1hp92VjlwPsR3ySifekd3hA=.491e7983-f3f5-4cf6-958d-22725af27d08@github.com> On Tue, 21 Mar 2023 22:38:18 GMT, Chris Plummer wrote: >> There are two GC related issues with this test that are being addressed. The test was limiting the heap size to 6m so if there is still a leak, it will be detected quickly. This proved to be too small of a size when using ZGC. For the most part changing the size to 7m fixed this issue. However, I was still seeing frequent issues with ZGC on macOS. This is explained by [JDK-8304449](https://bugs.openjdk.org/browse/JDK-8304449), which noticed (rarely) OOME on macos even when not using ZGC. From JDK-8304449: >> >> "macOS has a thread behavior that is not seen on linux and windows that is causing more memory usage, which sometimes leads to this unexpected OOME. The debuggee side of the test constantly creates threads that do little more than a short sleep. It has a counter of "live" threads, and won't let that go over 500. On the debugger side it is just tracking ThreadStartEvents and ThreadDeathEvents. It keep tracks of threads (ThreadReferences) for which a ThreadStartEvent had been received but a ThreadDeathEvent has not. On linux and windows the count of outstanding threads is generally in the 200-400 range, sometimes briefly going over 500. However, on macOS it is closer to 2400. This means a lot more ThreadReferences being tracked, which means more memory usage, so sometimes you see an OOME on macOS as a result. " >> >> The `threads` collection mainly existed just so its size could be used to log the number of outstanding ThreadDeathEvents. I got rid of the `threads` collection and instead am just tracking the number of ThreadStartEvents and ThreadDeathEvents, and computing the difference to get the number of outstanding ThreadDeathEvents. > > Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: > > get rid of some locals that are not needed Can I get one more review please? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13130#issuecomment-1489025923 From sspitsyn at openjdk.org Wed Mar 29 18:02:38 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 Mar 2023 18:02:38 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: References: Message-ID: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: updated correction of jt->jvmti_thread_state() links in VM_SetNotifyJvmtiEventsMode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/2c59c54b..c5d8a015 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=07-08 Stats: 14 lines in 1 file changed: 4 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Wed Mar 29 18:02:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 Mar 2023 18:02:42 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 10:16:39 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1554: >> >>> 1552: // Correct jt->jvmti_thread_state() and jt->jvmti_vthread() if necessary. >>> 1553: // It was not maintained while notifyJvmti was disabled. >>> 1554: if (jt_state != ct_state && jt_state != vt_state) { >> >> Is it possible that jt_state == ct_state while the virtual thread is executed or vice versa? Just because jvmtt_state is outdated. >> Shouldn't we always update (set to null) link/ jvmti_vthread if _enabled == true? > > A1: Not sure, I understand your first question correctly. What does mean "vice versa" in this context? > When `notifyJvmti` events is disabled then a call to `rebind_to_jvmti_thread_state_of` is omitted in VTMS transitions. So, we need to correct it if necessary. It can be `jt_state == ct_state` while the virtual thread is executed in a mount/unmount transition. I keep thinking on how to make this fixup more precise. > > A better approach would be something like this: > > if (virt) { > if (jt_state != vt_state) { > jt->set_jvmti_thread_state(vt_state); // restore jt->jvmti_thread_state() > jt->set_jvmti_vthread(vt_oop); // restore jt->jvmti_vthread() > if (vt_state != nullptr) { > vt_state->set_thread(jt); // restore JavaThread link > } > } > } else { // !virt > if (jt_state != ct_state) { > jt->set_jvmti_thread_state(ct_state); // restore jt->jvmti_thread_state() > jt->set_jvmti_vthread(nullptr); // reset jt->jvmti_vthread() > } > } > > But it does not work correctly now. Some adjustment is needed to make it working. > >> Shouldn't we always update (set to null) link/ jvmti_vthread if _enabled == true? > > A2: Ideally, all these corrections are only needed for the case: `_enable == true.` I'm testing this now. I've updated this part. Please. let me know if you still have some questions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1152304909 From cjplummer at openjdk.org Wed Mar 29 18:58:25 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 Mar 2023 18:58:25 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again test/hotspot/jtreg/serviceability/jvmti/thread/GetFrameCount/framecnt01/framecnt01.java line 82: > 80: > 81: // this is too fragile, implementation can change at any time. > 82: checkFrames(vThread1, false, 14); Is this due to the `@hidden` being added to `Continuation.enter()` and `enter0()`? If so, since both methods are now hidden, why are there not 2 fewer frames? Was there also an additional frame added somewhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1152337485 From cjplummer at openjdk.org Wed Mar 29 18:58:29 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 Mar 2023 18:58:29 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> Message-ID: On Wed, 29 Mar 2023 07:27:50 GMT, Alan Bateman wrote: >> test/jdk/com/sun/management/ThreadMXBean/VirtualThreads.java line 143: >> >>> 141: long[] tids = new long[] { tid0, tid1 }; >>> 142: long[] cpuTimes = bean.getThreadCpuTime(tids); >>> 143: if (Thread.currentThread().isVirtual()) { >> >> How it worked before? > > tid0 is the thread ID of a platform therad. tid1 is the threadID of a virtual thread. The only change here is allow this test run with the main wrapper plugin ([CODETOOLS-7903373](https://bugs.openjdk.org/browse/CODETOOLS-7903373)), it would otherwise have to be excluded for those runs. I don't see any problemlist changes. Was this test failing when using the wrapper because of the lack of problemlisting? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1152353646 From alanb at openjdk.org Wed Mar 29 19:33:15 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 29 Mar 2023 19:33:15 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v2] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <9RJ4unb3FjazYLi0BbWs1NGN9h50Z1fvAz1ZNds5mO4=.cb02f148-66bd-4429-9c30-ea4a6dcbe4d7@github.com> Message-ID: <1bIwzQpS67LQhRTy1ZlTEaASKmr0YAVxhMIKohuEydc=.d911ce7a-1f2e-4320-8c8c-66dc00174b9f@github.com> On Wed, 29 Mar 2023 18:47:03 GMT, Chris Plummer wrote: >> tid0 is the thread ID of a platform therad. tid1 is the threadID of a virtual thread. The only change here is allow this test run with the main wrapper plugin ([CODETOOLS-7903373](https://bugs.openjdk.org/browse/CODETOOLS-7903373)), it would otherwise have to be excluded for those runs. > > I don't see any problemlist changes. Was this test failing when using the wrapper because of the lack of problemlisting? I added this test recently via JDK-8303242. It failed when we sync'ed up the loom repo as the test configuration there runs the tests with the jtreg main wrapper. It was trivial to fix and this avoid needing to exclude it via ProblemList-Virtual.txt. Once jtreg is promoted and there is config added to run the tests with the virtual ThreadFactory then this test will be able to run in this mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1152393905 From alanb at openjdk.org Wed Mar 29 19:42:35 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 29 Mar 2023 19:42:35 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: <4_Xz1S2uFPVW_ySuYjEtJFmmEKIWgw1DD9hLlTGyESs=.383d569f-1e08-48be-9af0-c39d4c5ee7c1@github.com> On Wed, 29 Mar 2023 18:30:01 GMT, Chris Plummer wrote: >> Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix ThreadSleepEvent again > > test/hotspot/jtreg/serviceability/jvmti/thread/GetFrameCount/framecnt01/framecnt01.java line 82: > >> 80: >> 81: // this is too fragile, implementation can change at any time. >> 82: checkFrames(vThread1, false, 14); > > Is this due to the `@hidden` being added to `Continuation.enter()` and `enter0()`? If so, since both methods are now hidden, why are there not 2 fewer frames? Was there also an additional frame added somewhere? No, it's that a lambda expression is replaced in the implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1152402303 From dcubed at openjdk.org Wed Mar 29 19:45:31 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 19:45:31 GMT Subject: RFR: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC [v2] In-Reply-To: References: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> Message-ID: On Tue, 21 Mar 2023 22:38:18 GMT, Chris Plummer wrote: >> There are two GC related issues with this test that are being addressed. The test was limiting the heap size to 6m so if there is still a leak, it will be detected quickly. This proved to be too small of a size when using ZGC. For the most part changing the size to 7m fixed this issue. However, I was still seeing frequent issues with ZGC on macOS. This is explained by [JDK-8304449](https://bugs.openjdk.org/browse/JDK-8304449), which noticed (rarely) OOME on macos even when not using ZGC. From JDK-8304449: >> >> "macOS has a thread behavior that is not seen on linux and windows that is causing more memory usage, which sometimes leads to this unexpected OOME. The debuggee side of the test constantly creates threads that do little more than a short sleep. It has a counter of "live" threads, and won't let that go over 500. On the debugger side it is just tracking ThreadStartEvents and ThreadDeathEvents. It keep tracks of threads (ThreadReferences) for which a ThreadStartEvent had been received but a ThreadDeathEvent has not. On linux and windows the count of outstanding threads is generally in the 200-400 range, sometimes briefly going over 500. However, on macOS it is closer to 2400. This means a lot more ThreadReferences being tracked, which means more memory usage, so sometimes you see an OOME on macOS as a result. " >> >> The `threads` collection mainly existed just so its size could be used to log the number of outstanding ThreadDeathEvents. I got rid of the `threads` collection and instead am just tracking the number of ThreadStartEvents and ThreadDeathEvents, and computing the difference to get the number of outstanding ThreadDeathEvents. > > Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: > > get rid of some locals that are not needed Thumbs up. Thanks for the explanations in the PR and in the bug reports. What kind of testing has been done on this fix? ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13130#pullrequestreview-1363828549 From simeon.danailov.andreev at gmail.com Wed Mar 29 20:15:57 2023 From: simeon.danailov.andreev at gmail.com (S A) Date: Wed, 29 Mar 2023 23:15:57 +0300 Subject: Compile error in multi type catch block with Java 11, runtime error with Java 17 Message-ID: Hi all, could someone clarify the following difference in javac behavior between Java 11 and Java 17? Which one is correct? There are two source files: package p; public class Exceptions { private static class E extends Exception { public void m() { } } public static class E1 extends E {} public static class E2 extends E {} } package q; import p.Exceptions; import p.Exceptions.*; public class Test { public static void main(String[] args) { try { if (false) { throw new E2(); } throw new E1(); } catch (E1 | E2 e) { e.m(); // javac 11: ...defined in an inaccessible class... } } } Compiling with Java 11 results in an error: q/Test.java:12: error: E.m() is defined in an inaccessible class or interface e.m(); // javac 11: ...defined in an inaccessible class... ^ 1 error Compiling with Java 17 doesn't result in an error, but running the code results in a runtime exception: Exception in thread "main" java.lang.IllegalAccessError: failed to access class p.Exceptions$E from class q.Test (p.Exceptions$E and q.Test are in unnamed module of loader 'app') at q.Test.main(Test.java:12) The difference in behavior between Java 11 and Java 17 is probably introduced with: https://bugs.openjdk.org/browse/JDK-8264696 The bug-fix was done for: https://youtrack.jetbrains.com/issue/IDEA-297529 A similar bug was opened for ecj: https://github.com/eclipse-jdt/eclipse.jdt.core/issues/198 It looks like John Vasileff has gone through the spec and asserts the Java 17 behavior is not expected, see comments: https://github.com/eclipse-jdt/eclipse.jdt.core/issues/198#issuecomment-1177897759 https://youtrack.jetbrains.com/issue/IDEA-297529/Wrong-compilation-error-reported-when-multi-type-catch-block-references-visible-method-from-invisible-type#focus=Comments-27-6280209.0-0 > because, within the catch block, the type of `e` is `Exceptions.E` per the LUB algorithm in "4.10.4. Least Upper Bound". And, per "6.6. Access Control", `Exceptions.E` should not be accessible outside of `Exceptions`. The members of `Exceptions.E` should be unavailable. > > In "6.6. Access Control": > > > Otherwise, the member or constructor is declared private. Access is permitted only when the access occurs from within the body of the top level class or interface that encloses the declaration of the member or constructor. Best regards and thanks, Simeon -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjplummer at openjdk.org Wed Mar 29 20:37:37 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 Mar 2023 20:37:37 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again Serviceability changes look good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13203#pullrequestreview-1363917524 From cjplummer at openjdk.org Wed Mar 29 20:41:30 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 Mar 2023 20:41:30 GMT Subject: RFR: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC [v2] In-Reply-To: References: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> Message-ID: On Wed, 29 Mar 2023 19:42:58 GMT, Daniel D. Daugherty wrote: > What kind of testing has been done on this fix? I ran this tests 25 times on each supported platform, both with and with ZGC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13130#issuecomment-1489276679 From lmesnik at openjdk.org Wed Mar 29 21:49:21 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 29 Mar 2023 21:49:21 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: <96yRAvTFAMp431o3TzORkDS73JONn6D4UwdLwSbUqv4=.1a128faa-78e5-427b-98f4-c5624295e396@github.com> On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again Test changes looks good. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13203#pullrequestreview-1363976888 From psandoz at openjdk.org Wed Mar 29 21:49:32 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 29 Mar 2023 21:49:32 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again src/java.base/share/classes/java/lang/Thread.java line 1546: > 1544: // bind thread to container > 1545: if (this.container != null) > 1546: throw new IllegalThreadStateException(); This check is not replicated in `VirtualThread::start`, i think the CAS protects against that. Maybe assert instead in the virtual thread implementation, thereby the comment in `setThreadContainer` can be changed to something like "`this.container` checked/asserted to be != null before call to Virtual/Thread::start(ThreadContainer)" ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1152514040 From dcubed at openjdk.org Wed Mar 29 21:50:26 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 21:50:26 GMT Subject: RFR: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC [v2] In-Reply-To: References: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> Message-ID: On Tue, 21 Mar 2023 22:38:18 GMT, Chris Plummer wrote: >> There are two GC related issues with this test that are being addressed. The test was limiting the heap size to 6m so if there is still a leak, it will be detected quickly. This proved to be too small of a size when using ZGC. For the most part changing the size to 7m fixed this issue. However, I was still seeing frequent issues with ZGC on macOS. This is explained by [JDK-8304449](https://bugs.openjdk.org/browse/JDK-8304449), which noticed (rarely) OOME on macos even when not using ZGC. From JDK-8304449: >> >> "macOS has a thread behavior that is not seen on linux and windows that is causing more memory usage, which sometimes leads to this unexpected OOME. The debuggee side of the test constantly creates threads that do little more than a short sleep. It has a counter of "live" threads, and won't let that go over 500. On the debugger side it is just tracking ThreadStartEvents and ThreadDeathEvents. It keep tracks of threads (ThreadReferences) for which a ThreadStartEvent had been received but a ThreadDeathEvent has not. On linux and windows the count of outstanding threads is generally in the 200-400 range, sometimes briefly going over 500. However, on macOS it is closer to 2400. This means a lot more ThreadReferences being tracked, which means more memory usage, so sometimes you see an OOME on macOS as a result. " >> >> The `threads` collection mainly existed just so its size could be used to log the number of outstanding ThreadDeathEvents. I got rid of the `threads` collection and instead am just tracking the number of ThreadStartEvents and ThreadDeathEvents, and computing the difference to get the number of outstanding ThreadDeathEvents. > > Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: > > get rid of some locals that are not needed Was the com/sun/jdi/ThreadMemoryLeakTest.java test executed by itself in those runs or did you run the entire test task or at least com/sun/jdi so that parallelism was also a factor? Okay. Let's go with it and I'll keep an eye on the CI (like usual)... ------------- PR Comment: https://git.openjdk.org/jdk/pull/13130#issuecomment-1489290171 PR Comment: https://git.openjdk.org/jdk/pull/13130#issuecomment-1489299074 From cjplummer at openjdk.org Wed Mar 29 21:50:34 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 Mar 2023 21:50:34 GMT Subject: RFR: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC [v2] In-Reply-To: References: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> Message-ID: <3VbqFGgJf2mLJvmv1Fd4EdCmDfYq8FM3zmw82lFNuxM=.1e221416-b5b6-4cff-9a9f-ed0fd9f686aa@github.com> On Tue, 21 Mar 2023 22:38:18 GMT, Chris Plummer wrote: >> There are two GC related issues with this test that are being addressed. The test was limiting the heap size to 6m so if there is still a leak, it will be detected quickly. This proved to be too small of a size when using ZGC. For the most part changing the size to 7m fixed this issue. However, I was still seeing frequent issues with ZGC on macOS. This is explained by [JDK-8304449](https://bugs.openjdk.org/browse/JDK-8304449), which noticed (rarely) OOME on macos even when not using ZGC. From JDK-8304449: >> >> "macOS has a thread behavior that is not seen on linux and windows that is causing more memory usage, which sometimes leads to this unexpected OOME. The debuggee side of the test constantly creates threads that do little more than a short sleep. It has a counter of "live" threads, and won't let that go over 500. On the debugger side it is just tracking ThreadStartEvents and ThreadDeathEvents. It keep tracks of threads (ThreadReferences) for which a ThreadStartEvent had been received but a ThreadDeathEvent has not. On linux and windows the count of outstanding threads is generally in the 200-400 range, sometimes briefly going over 500. However, on macOS it is closer to 2400. This means a lot more ThreadReferences being tracked, which means more memory usage, so sometimes you see an OOME on macOS as a result. " >> >> The `threads` collection mainly existed just so its size could be used to log the number of outstanding ThreadDeathEvents. I got rid of the `threads` collection and instead am just tracking the number of ThreadStartEvents and ThreadDeathEvents, and computing the difference to get the number of outstanding ThreadDeathEvents. > > Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: > > get rid of some locals that are not needed Just the one test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13130#issuecomment-1489294898 From dcubed at openjdk.org Wed Mar 29 22:00:11 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 22:00:11 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v35] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 13:51:51 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Process lock-stack oops before inspecting them, when in foreign thread and not at safepoint. Add verifications. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6242: > 6240: // After successful lock, push object on lock-stack > 6241: ldrw(t1, Address(rthread, JavaThread::lock_stack_offset_offset())); > 6242: // Strictly speaking we should emit an IN_NATIVE store. However, non nit typo: s/non/none/ src/hotspot/share/runtime/lockStack.cpp line 56: > 54: void LockStack::verify(const char* msg) const { > 55: assert(is_self() || SafepointSynchronize::is_at_safepoint() || _thread->is_handshake_safe_for(Thread::current()) || _thread->is_suspended() || _thread->is_obj_deopt_suspend() || is_stack_watermark_processing(_thread), > 56: "access only thread-local, or when target thread safely holds stil"); nit typo: s/holds stil/holds still/ src/hotspot/share/runtime/threads.cpp line 1423: > 1421: > 1422: JavaThread* Threads::owning_thread_from_object(ThreadsList * t_list, oop obj) { > 1423: assert(SafepointSynchronize::is_at_safepoint(), "not safe outside of safepoint"); `ObjectSynchronizer::get_lock_owner()` calls `Threads::owning_thread_from_object()` and I don't think you can assert that `ObjectSynchronizer::get_lock_owner()` is only called from a safepoint. In particular, `ThreadSnapshot::initialize()` calls `ObjectSynchronizer::get_lock_owner()` and I believe that `initialize()` function can be called from non-safepoint places with M&M APIs... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1152505819 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1152509017 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1152522309 From dcubed at openjdk.org Wed Mar 29 22:00:45 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 22:00:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v31] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 10:28:00 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > (x86_32) Use existing thread register in fast_unlock() instead of fetching thread into a tmp register src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 702: > 700: bind(COUNT); > 701: // Count monitors in fast path > 702: increment(Address(thread, JavaThread::held_monitor_count_offset())); Okay you updated this logic to use the `thread` register that's passed in, but there are two other uses of `r15_thread` in `C2_MacroAssembler::fast_lock()` on L679 and L686 that could also be switched to `thread`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1152491935 From psandoz at openjdk.org Wed Mar 29 22:11:09 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 29 Mar 2023 22:11:09 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again Looks good. test/jdk/java/lang/Thread/virtual/TraceVirtualThreadLocals.java line 65: > 63: > 64: /** > 65: * Runs a task in a virutal thread, returning a String with any output printed Suggestion: * Runs a task in a virtual thread, returning a String with any output printed ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13203#pullrequestreview-1364039280 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1152529524 From dcubed at openjdk.org Wed Mar 29 22:16:59 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 22:16:59 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v37] In-Reply-To: References: Message-ID: <1ZtuIYVxUaqsHQSGuCJB0qDHqfXzVEyidpaEKcyoxIs=.e9a9b540-5e0c-474d-a89b-09e99ea27ddb@github.com> On Wed, 29 Mar 2023 16:05:05 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Bounds check in lock-stack verification; only do watermark if we have one At this point, I've reviewed the changes in v29 -> v36 and I've posted comments for several things. I still need to update my repo to get in sync with v36. src/hotspot/share/runtime/lockStack.cpp line 69: > 67: assert(UseFastLocking && !UseHeavyMonitors, "never use lock-stack when fast-locking is disabled"); > 68: assert((_offset <= end_offset()), "lockstack overflow: _offset %d end_offset %d", _offset, end_offset()); > 69: assert((_offset >= start_offset()), "lockstack underflow: _offset %d end_offset %d", _offset, start_offset()); You should save a local copy of `end_offset()` and a local copy of `start_offset()` in an `#ifdef ASSERT ... #endif` code block and then use those local copies in the `assert()` condition and mesg. That will guard against parallel usage by the target thread versus the verifying thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1489399633 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1152532606 From dcubed at openjdk.org Wed Mar 29 22:17:03 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 22:17:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v35] In-Reply-To: References: Message-ID: <-xn2Y5lZ-63av1QhbJNDx4saAzgmFT9hE394alHUFHI=.ae58b01f-d6ef-45f3-8e42-fc7b7a552b8e@github.com> On Wed, 29 Mar 2023 21:55:35 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Process lock-stack oops before inspecting them, when in foreign thread and not at safepoint. Add verifications. > > src/hotspot/share/runtime/threads.cpp line 1423: > >> 1421: >> 1422: JavaThread* Threads::owning_thread_from_object(ThreadsList * t_list, oop obj) { >> 1423: assert(SafepointSynchronize::is_at_safepoint(), "not safe outside of safepoint"); > > `ObjectSynchronizer::get_lock_owner()` calls `Threads::owning_thread_from_object()` > and I don't think you can assert that `ObjectSynchronizer::get_lock_owner()` is only > called from a safepoint. In particular, `ThreadSnapshot::initialize()` calls > `ObjectSynchronizer::get_lock_owner()` and I believe that `initialize()` function can > be called from non-safepoint places with M&M APIs... Update: In v35 you put back the `!UseFastLocking` check in `jmm_GetThreadInfo()` so now the `assert()` won't fire, but you've again changed the behavior of that API and now it will be able to observe fewer thread state changes than it did before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1152526559 From dcubed at openjdk.org Wed Mar 29 22:17:05 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 22:17:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v35] In-Reply-To: <-xn2Y5lZ-63av1QhbJNDx4saAzgmFT9hE394alHUFHI=.ae58b01f-d6ef-45f3-8e42-fc7b7a552b8e@github.com> References: <-xn2Y5lZ-63av1QhbJNDx4saAzgmFT9hE394alHUFHI=.ae58b01f-d6ef-45f3-8e42-fc7b7a552b8e@github.com> Message-ID: On Wed, 29 Mar 2023 22:02:02 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/threads.cpp line 1423: >> >>> 1421: >>> 1422: JavaThread* Threads::owning_thread_from_object(ThreadsList * t_list, oop obj) { >>> 1423: assert(SafepointSynchronize::is_at_safepoint(), "not safe outside of safepoint"); >> >> `ObjectSynchronizer::get_lock_owner()` calls `Threads::owning_thread_from_object()` >> and I don't think you can assert that `ObjectSynchronizer::get_lock_owner()` is only >> called from a safepoint. In particular, `ThreadSnapshot::initialize()` calls >> `ObjectSynchronizer::get_lock_owner()` and I believe that `initialize()` function can >> be called from non-safepoint places with M&M APIs... > > Update: In v35 you put back the `!UseFastLocking` check in `jmm_GetThreadInfo()` > so now the `assert()` won't fire, but you've again changed the behavior of that API > and now it will be able to observe fewer thread state changes than it did before. Please explain why you think this is "not safe". Yes, you can observe state that is in the process of changing, but do you think that we'll see a crash with allowing `Threads::owning_thread_from_object()` to be called from a non-safepoint place? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1152536799 From pchilanomate at openjdk.org Wed Mar 29 22:18:36 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 29 Mar 2023 22:18:36 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> References: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> Message-ID: On Wed, 29 Mar 2023 18:02:38 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: updated correction of jt->jvmti_thread_state() links in VM_SetNotifyJvmtiEventsMode Hi Serguei, I took a look at the patch and looks good to me. I have a couple of comments though. Thanks, Patricio src/hotspot/share/prims/jvmtiEnvBase.cpp line 1554: > 1552: } > 1553: // Correct jt->jvmti_thread_state() and jt->jvmti_vthread() if necessary. > 1554: // It was not maintained while notifyJvmti was disabled. While trying to understand which exact situation we are trying to guard against with this code, I run the test without the sleeps and without this restore code and I got a crash when deleting a JvmtiThreadState (null dereference of _thread in the ~()). Probably the same crash you mentioned you had. But when debugging the crash I see that the problem is that the assumption that disabling the flag is done when no virtual threads are running is not guaranteed (see my comment there). So I think we are trying to address a case that shouldn't happen in the first place. Also not sure if applying this restore in all cases will be correct, since we might be somewhere at a transition. For example, a thread could have blocked right in the return from notifyJvmtiUnmount() in yieldContinuation(). It will looked like virtual because unmount() was not executed yet, and the jvmti_thread_state should be that of the platform thread because we never changed it when mounting. We should leave the state a s is but in here we would change it to the virtual thread's jvmti state. The only case I think it makes sense to do this restore steps when enabling the flag is for those threads that are outside a transition with a mounted virtual thread, since we want to adjust the jvmti_thread_state so that it looks right on the next unmount. But in any case this is also only needed when using the WhiteBox methods right? In the intended case (no WhiteBox method used), after we execute this operation to enable the events, we will create the JvmtiThreadStates later in JvmtiExport::get_jvmti_interface() and the correct jvmti_thread_state and jvmti_vthread will be already set for each JavaThread. In that case can we only execute this restore code when using the WhiteBox API? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 142: > 140: TestedThread thread = threads[i]; > 141: if (thread == null) { > 142: break; Bailing out here means we could later disable the flag while there are virtual threads running. If I comment out the first two sleeps in the main thread I can see that issue happening. To avoid relying on timing I suggest using a semaphore to wait at the beginning of finishThreads(), and signal at the end of startThreads(). ------------- PR Review: https://git.openjdk.org/jdk/pull/13133#pullrequestreview-1362140557 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1152508618 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1152514856 From pchilanomate at openjdk.org Wed Mar 29 22:18:39 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 29 Mar 2023 22:18:39 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 18:57:23 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fixed trailing spaces in two files src/hotspot/share/prims/jvmtiEnvBase.cpp line 1550: > 1548: > 1549: if (jt->is_in_VTMS_transition()) { > 1550: count++; For those threads that are in a transition when we enable the events, shouldn't we also set the jvmti_is_in_VTMS_transition field for the corresponding vthread as we do in JvmtiVTMSTransitionDisabler::start_VTMS_transition()? It seems a JvmtiVTMSTransitionDisabler that targets that particular vthread could otherwise proceed after the safepoint while that vthread is still in the transition. The "all" JvmtiVTMSTransitionDisabler won't proceed because that one does check the _VTMS_transition_count counter. I see that in general we won't have access to the vthread oop though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1151313147 From dcubed at openjdk.org Wed Mar 29 22:29:46 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 22:29:46 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v37] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 16:05:05 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Bounds check in lock-stack verification; only do watermark if we have one Hmm... the latest doesn't build on my MBP13: In file included from /System/Volumes/Data/work/shared/bug_hunt/8291555_for_jdk21.git/open/src/hotspot/share/runtime/objectMonitor.inline.hpp:33: /System/Volumes/Data/work/shared/bug_hunt/8291555_for_jdk21.git/open/src/hotspot/share/runtime/lockStack.inline.hpp:95:56: error: use of undeclared identifier '_thread' StackWatermark* watermark = StackWatermarkSet::get(_thread, StackWatermarkKind::gc); ^ 1 error generated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1489414909 From dcubed at openjdk.org Wed Mar 29 22:32:09 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Mar 2023 22:32:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v37] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 16:05:05 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Bounds check in lock-stack verification; only do watermark if we have one Ahhh... class LockStack { #ifdef ASSERT JavaThread* const _thread; #endif so the release build is not happy... ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1489417124 From mchung at openjdk.org Wed Mar 29 23:03:42 2023 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 29 Mar 2023 23:03:42 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again The stack walker change looks good. I also reviewed other changes which look fine to me. ------------- Marked as reviewed by mchung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13203#pullrequestreview-1364091585 From cjplummer at openjdk.org Wed Mar 29 23:32:31 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 Mar 2023 23:32:31 GMT Subject: Integrated: 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC In-Reply-To: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> References: <_iuioW7_e46CcwWlfoyujmo5Bj5Kgs-UN9gqxMmlWVM=.dd14ef0c-601e-4243-8631-85a5f712fddb@github.com> Message-ID: On Tue, 21 Mar 2023 21:38:12 GMT, Chris Plummer wrote: > There are two GC related issues with this test that are being addressed. The test was limiting the heap size to 6m so if there is still a leak, it will be detected quickly. This proved to be too small of a size when using ZGC. For the most part changing the size to 7m fixed this issue. However, I was still seeing frequent issues with ZGC on macOS. This is explained by [JDK-8304449](https://bugs.openjdk.org/browse/JDK-8304449), which noticed (rarely) OOME on macos even when not using ZGC. From JDK-8304449: > > "macOS has a thread behavior that is not seen on linux and windows that is causing more memory usage, which sometimes leads to this unexpected OOME. The debuggee side of the test constantly creates threads that do little more than a short sleep. It has a counter of "live" threads, and won't let that go over 500. On the debugger side it is just tracking ThreadStartEvents and ThreadDeathEvents. It keep tracks of threads (ThreadReferences) for which a ThreadStartEvent had been received but a ThreadDeathEvent has not. On linux and windows the count of outstanding threads is generally in the 200-400 range, sometimes briefly going over 500. However, on macOS it is closer to 2400. This means a lot more ThreadReferences being tracked, which means more memory usage, so sometimes you see an OOME on macOS as a result. " > > The `threads` collection mainly existed just so its size could be used to log the number of outstanding ThreadDeathEvents. I got rid of the `threads` collection and instead am just tracking the number of ThreadStartEvents and ThreadDeathEvents, and computing the difference to get the number of outstanding ThreadDeathEvents. This pull request has now been integrated. Changeset: 9643f654 Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/9643f654da23cfc336d36385031251d039e0550d Stats: 19 lines in 2 files changed: 0 ins; 9 del; 10 mod 8304436: com/sun/jdi/ThreadMemoryLeakTest.java fails with "OutOfMemoryError: Java heap space" with ZGC 8304449: com/sun/jdi/ThreadMemoryLeakTest.java times out Reviewed-by: lmesnik, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/13130 From david.holmes at oracle.com Wed Mar 29 23:37:59 2023 From: david.holmes at oracle.com (David Holmes) Date: Thu, 30 Mar 2023 09:37:59 +1000 Subject: Compile error in multi type catch block with Java 11, runtime error with Java 17 In-Reply-To: References: Message-ID: <2e1ae547-9961-5108-44d4-4795eee63346@oracle.com> Hi, This is the wrong mailing list for this issue - you want compiler-dev for javac issues. Cheers, David On 30/03/2023 6:15 am, S A wrote: > Hi all, > > could someone clarify the following difference in javac behavior between > Java 11 and Java 17? Which one is correct? > > There are two source files: > > package p; > public class Exceptions { > ? ? private static class E extends Exception { > ? ? ? ? public void m() { } > ? ? } > ? ? public static class E1 extends E {} > ? ? public static class E2 extends E {} > } > > package q; > import p.Exceptions; > import p.Exceptions.*; > public class Test { > ? ? public static void main(String[] args) { > ? ? ? ? try { > ? ? ? ? ? ? if (false) { throw new E2(); } > ? ? ? ? ? ? throw new E1(); > ? ? ? ? } catch (E1 | E2 e) { > ? ? ? ? ? ? e.m(); // javac 11: ...defined in an inaccessible class... > ? ? ? ? } > ? ? } > } > > Compiling with Java 11 results in an error: > > q/Test.java:12: error: E.m() is defined in an inaccessible class or > interface > ????????????e.m(); // javac 11: ...defined in an inaccessible class... > ?????????????^ > 1 error > > Compiling with Java 17 doesn't result in an error, but running the code > results in a runtime exception: > > Exception in thread "main" java.lang.IllegalAccessError: failed to > access class p.Exceptions$E from class q.Test (p.Exceptions$E and q.Test > are in unnamed module of loader 'app') > ????????at q.Test.main(Test.java:12) > > The difference in behavior between Java 11 and Java 17 is probably > introduced with: https://bugs.openjdk.org/browse/JDK-8264696 > > > The bug-fix was done for: > https://youtrack.jetbrains.com/issue/IDEA-297529 > > A similar bug was opened for ecj: > https://github.com/eclipse-jdt/eclipse.jdt.core/issues/198 > > > It looks like John Vasileff has gone through the spec and asserts the > Java 17 behavior is not expected, see comments: > > https://github.com/eclipse-jdt/eclipse.jdt.core/issues/198#issuecomment-1177897759 > https://youtrack.jetbrains.com/issue/IDEA-297529/Wrong-compilation-error-reported-when-multi-type-catch-block-references-visible-method-from-invisible-type#focus=Comments-27-6280209.0-0 > > > because, within the catch block, the type of `e` is `Exceptions.E` > per the LUB algorithm in "4.10.4. Least Upper Bound". And, per "6.6. > Access Control", `Exceptions.E` should not be accessible outside of > `Exceptions`. The members of `Exceptions.E` should be unavailable. > > > > In "6.6. Access Control": > > > > > Otherwise, the member or constructor is declared private. Access is > permitted only when the access occurs from within the body of the top > level class or interface that encloses the declaration of the member or > constructor. > > Best regards and thanks, > Simeon From sspitsyn at openjdk.org Thu Mar 30 00:59:25 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 00:59:25 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 01:52:18 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed trailing spaces in two files > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1550: > >> 1548: >> 1549: if (jt->is_in_VTMS_transition()) { >> 1550: count++; > > For those threads that are in a transition when we enable the events, shouldn't we also set the jvmti_is_in_VTMS_transition field for the corresponding vthread as we do in JvmtiVTMSTransitionDisabler::start_VTMS_transition()? It seems a JvmtiVTMSTransitionDisabler that targets that particular vthread could otherwise proceed after the safepoint while that vthread is still in the transition. The "all" JvmtiVTMSTransitionDisabler won't proceed because that one does check the _VTMS_transition_count counter. I see that in general we won't have access to the vthread oop though. Nice catch, thanks! Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1152627288 From sspitsyn at openjdk.org Thu Mar 30 01:14:28 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 01:14:28 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v10] In-Reply-To: References: Message-ID: <1kg4aiIq4QllkMfVInev4q-FtSem-N_xjuVpliowQAA=.dfcc7878-c186-4d31-9a93-2e7a22c404ab@github.com> > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: refactored jt->jvmti_thread_state() corrections in VM_SetNotifyJvmtiEventsMode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/c5d8a015..d1d4c030 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=08-09 Stats: 74 lines in 2 files changed: 51 ins; 22 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Thu Mar 30 01:35:32 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 01:35:32 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: References: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> Message-ID: On Wed, 29 Mar 2023 21:36:04 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: updated correction of jt->jvmti_thread_state() links in VM_SetNotifyJvmtiEventsMode > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1554: > >> 1552: } >> 1553: // Correct jt->jvmti_thread_state() and jt->jvmti_vthread() if necessary. >> 1554: // It was not maintained while notifyJvmti was disabled. > > While trying to understand which exact situation we are trying to guard against with this code, I run the test without the sleeps and without this restore code and I got a crash when deleting a JvmtiThreadState (null dereference of _thread in the ~()). Probably the same crash you mentioned you had. But when debugging the crash I see that the problem is that the assumption that disabling the flag is done when no virtual threads are running is not guaranteed (see my comment there). So I think we are trying to address a case that shouldn't happen in the first place. Also not sure if applying this restore in all cases will be correct, since we might be somewhere at a transition. For example, a thread could have blocked right in the return from notifyJvmtiUnmount() in yieldContinuation(). It will looked like virtual because unmount() was not executed yet, and the jvmti_thread_state should be that of the platform thread because we never changed it when mounting. We should leave the state as is but in here we would change it to the virtual thread's jvmti state. The only case I think it makes sense to do this restore steps when enabling the flag is for those threads that are outside a transition with a mounted virtual thread, since we want to adjust the jvmti_thread_state so that it looks right on the next unmount. > But in any case this is also only needed when using the WhiteBox methods right? In the intended case (no WhiteBox method used), after we execute this operation to enable the events, we will create the JvmtiThreadStates later in JvmtiExport::get_jvmti_interface() and the correct jvmti_thread_state and jvmti_vthread will be already set for each JavaThread. In that case can we only execute this restore code when using the WhiteBox API? I've just pushed my update with refactoring of these corrections in `VM_SetNotifyJvmtiEventsMode` after discussion of this code with Leonid. I hope it resolved at least part of your concerns. This kind of problem exists only when we disable+enable notifyJvmti events multiple times for testing purposes. This code with corrections of jt->jvmti_thread_state() is not needed when we enable notifyJvmti events just once. Your are correct that the ability to disable notifyJvmti events is implemented with using WhiteBox API. My last update has a comment explaining this. > In that case can we only execute this restore code when using the WhiteBox API? This is a good suggestion which we already discussed privately with Leonid. I've got an idea how to implement this check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1152641723 From sspitsyn at openjdk.org Thu Mar 30 02:03:23 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 02:03:23 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: References: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> Message-ID: <6ID-wSr38DrHzn45CCdg3DBaGzIJSXplXOqIg83pXSY=.2f187a59-251b-418b-a5d4-ebd7f11355bd@github.com> On Wed, 29 Mar 2023 22:15:35 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: updated correction of jt->jvmti_thread_state() links in VM_SetNotifyJvmtiEventsMode > > Hi Serguei, > > I took a look at the patch and looks good to me. I have a couple of comments though. > > Thanks, > Patricio @pchilano Thank you for looking at this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1489576601 From sspitsyn at openjdk.org Thu Mar 30 02:09:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 02:09:24 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: References: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> Message-ID: On Wed, 29 Mar 2023 21:44:56 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: updated correction of jt->jvmti_thread_state() links in VM_SetNotifyJvmtiEventsMode > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 142: > >> 140: TestedThread thread = threads[i]; >> 141: if (thread == null) { >> 142: break; > > Bailing out here means we could later disable the flag while there are virtual threads running. If I comment out the first two sleeps in the main thread I can see that issue happening. To avoid relying on timing I suggest using a semaphore to wait at the beginning of finishThreads(), and signal at the end of startThreads(). Good comment, thanks. I made `startThreads()` method to be synchronized instead of `startThread()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1152656758 From sspitsyn at openjdk.org Thu Mar 30 03:48:02 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 03:48:02 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v11] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: one more review round fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/d1d4c030..d05ed921 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=09-10 Stats: 20 lines in 2 files changed: 15 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From jpai at openjdk.org Thu Mar 30 05:59:09 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Thu, 30 Mar 2023 05:59:09 GMT Subject: RFR: 8304988: unnecessary dash in @param gives double-dash in docs Message-ID: Can I please get a review of this trivial doc only change which addresses https://bugs.openjdk.org/browse/JDK-8304988? I've run `make docs-image` after this change and the generated javadoc for this class looks fine. ------------- Commit messages: - 8304988: unnecessary dash in @param gives double-dash in docs Changes: https://git.openjdk.org/jdk/pull/13239/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13239&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304988 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/13239.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13239/head:pull/13239 PR: https://git.openjdk.org/jdk/pull/13239 From alanb at openjdk.org Thu Mar 30 07:32:17 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 30 Mar 2023 07:32:17 GMT Subject: RFR: 8304988: unnecessary dash in @param gives double-dash in docs In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 05:51:57 GMT, Jaikiran Pai wrote: > Can I please get a review of this trivial doc only change which addresses https://bugs.openjdk.org/browse/JDK-8304988? > > I've run `make docs-image` after this change and the generated javadoc for this class looks fine. Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13239#pullrequestreview-1364491260 From rkennke at openjdk.org Thu Mar 30 08:38:02 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 08:38:02 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v38] In-Reply-To: References: Message-ID: <1bJhB0umcd38IHnN9Vm38nFnYp588pVrlsnG4maB-GY=.b03f2444-da85-4385-bd2f-d080be0fd940@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More verification through zapping unused entries; get rid of _thread field ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/62298e49..e5afb43c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=36-37 Stats: 58 lines in 6 files changed: 38 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 30 08:50:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 08:50:57 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v39] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More verification through zapping unused entries, x86 parts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/e5afb43c..db856dc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=37-38 Stats: 11 lines in 3 files changed: 9 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 30 11:46:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 11:46:44 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v40] In-Reply-To: References: Message-ID: <0c1Yzyh4d0SRjKrfASJtuWOlc4ZVT5xwVepogKAdags=.fe4a8a57-a910-4a75-99dc-8e99bcc72e3c@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address more of @shipilev's review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/db856dc1..57171883 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=38-39 Stats: 11 lines in 3 files changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 30 11:48:46 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 11:48:46 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Sat, 25 Mar 2023 08:55:25 GMT, Thomas Stuefe wrote: > I have another question about the asymmetric unlocking code in `InterpreterMacroAssembler::unlock_object`. > > We go through here for both fast-locked and fat OM locks, right? If so, shouldn't we do the asymmetric lock check only for the fast locked case? Otherwise Lockstack may be empty, so we compare the word preceding the first slot, which would cause us to always break into the slow case? > > Sorry if I miss something here. Uh, yes, indeed. It works by accident, I suppose, because we don't segfault on the word preceding the lock-stack, and monitor-locking takes the slow-case in interpreter, anyway. But yeah, it's better to check for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1490163377 From mgronlun at openjdk.org Thu Mar 30 13:13:45 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 13:13:45 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: <5uEVRUEr0vSBiTHqKpKVwG1k-v5UrFr9RVAip3K8NSg=.a7bf35b5-b372-4ba6-b217-642c6ad4e2a8@github.com> References: <5uEVRUEr0vSBiTHqKpKVwG1k-v5UrFr9RVAip3K8NSg=.a7bf35b5-b372-4ba6-b217-642c6ad4e2a8@github.com> Message-ID: On Tue, 14 Mar 2023 12:23:08 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/prims/agentList.cpp line 419: >> >>> 417: const jint err = (*on_load_entry)(&main_vm, const_cast(agent->options()), NULL); >>> 418: if (err != JNI_OK) { >>> 419: vm_exit_during_initialization("-Xrun library failed to init", agent->name()); >> >> Do you need to be back in `_thread_in_vm` before exiting? > > Hmm. This was ported as is. I will double-check. Looks like there is no requirement to be in _thread_in_vm before invoking vm_exit_during_initialization(). vm_perform_shutdown_actions() will forcibly set the thread state to _thread_in_native (no transition). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1153243069 From rkennke at openjdk.org Thu Mar 30 13:27:01 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 13:27:01 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v41] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use ANONYMOUS_OWNER constant in SA ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/57171883..a75de7e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=39-40 Stats: 17 lines in 7 files changed: 7 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 30 13:29:12 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 13:29:12 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 30 Mar 2023 11:45:54 GMT, Roman Kennke wrote: >> I have another question about the asymmetric unlocking code in `InterpreterMacroAssembler::unlock_object`. >> >> We go through here for both fast-locked and fat OM locks, right? If so, shouldn't we do the asymmetric lock check only for the fast locked case? Otherwise Lockstack may be empty, so we compare the word preceding the first slot, which would cause us to always break into the slow case? >> >> Sorry if I miss something here. > >> I have another question about the asymmetric unlocking code in `InterpreterMacroAssembler::unlock_object`. >> >> We go through here for both fast-locked and fat OM locks, right? If so, shouldn't we do the asymmetric lock check only for the fast locked case? Otherwise Lockstack may be empty, so we compare the word preceding the first slot, which would cause us to always break into the slow case? >> >> Sorry if I miss something here. > > Uh, yes, indeed. It works by accident, I suppose, because we don't segfault on the word preceding the lock-stack, and monitor-locking takes the slow-case in interpreter, anyway. But yeah, it's better to check for it. > @rkennke Question about ZGC and LockStack::contains(): how does this work with colored pointers? Don't we have to mask the color bits out somehow when comparing? E.g. using `ZAddress::offset()` ? We must ensure that the oops are in a valid state. I recently added code to ::contains() to call start_processing() when called from a foreign thread. When inspecting its own thread, we are always good, because stack-watermark is processed right after leaving a safepoint, before resuming normal operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1490302961 From rkennke at openjdk.org Thu Mar 30 13:35:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 13:35:14 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 00:17:21 GMT, Dean Long wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensure safepoint when processing lock-stack > > src/hotspot/cpu/aarch64/aarch64.ad line 3844: > >> 3842: >> 3843: // Indicate success on completion. >> 3844: __ cmp(oop, oop); // Force ZF=1 to indicate success. > > It looks like `fast_lock` already sets ZF=1 on success/fall-through. Why not document this as part of the interface, then this `cmp` will be redundant? Indeed. I was assuming that the instructions that follow the CAS might be affecting the NZCV flags (like they do on x86), but apparently they don't. Very nice. I am removing the instruction here and following fast_unlock(). > src/hotspot/share/interpreter/interpreterRuntime.cpp line 740: > >> 738: if (!UseHeavyMonitors && UseFastLocking) { >> 739: // This is a hack to get around the limitation of registers in x86_32. We really >> 740: // send an oopDesc* instead of a BasicObjectLock*. > > I don't understand what this is referring to. Trying to avoid passing an extra argument? I updated the comment in an earlier commit to say: // TODO: We accept elem as void* to workaround a limitation of registers in x86_32. Interpreter // code is really sending an oopDesc* here. // The problem is that we would need to preserve the register that holds the BasicObjectLock, // but we are using that register to hold the thread. We don't have enough registers to // also keep the BasicObjectLock, but we don't really need it anyway, we only need // the object. See also InterpreterMacroAssembler::lock_object(). // As soon as traditional stack-locking goes away we can change elem to be oopDesc* // (also in monitorexit below). I hope that is clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153268963 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153270625 From rkennke at openjdk.org Thu Mar 30 13:58:45 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 13:58:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v42] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Splice entry points for InterpreterRuntime::monitorenter() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/a75de7e5..b0ffd9af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=40-41 Stats: 59 lines in 5 files changed: 28 ins; 15 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 30 13:58:50 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 13:58:50 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: <6K4zNT0oaIZAJEfULur4xhnrOKdjtzyJaDwwt_kxtqc=.ed98c939-a6b7-4edf-bd7b-5488369e85df@github.com> On Tue, 28 Mar 2023 00:53:57 GMT, Dean Long wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensure safepoint when processing lock-stack > > src/hotspot/share/interpreter/interpreterRuntime.cpp line 746: > >> 744: ObjectSynchronizer::enter(h_obj, nullptr, current); >> 745: return; >> 746: } > > Why not put this code in a new function declared as InterpreterRuntime::monitorenter(JavaThread* current, oop obj) and have the caller decide which one to call? That is a good suggestion, I did that change. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153300881 From rkennke at openjdk.org Thu Mar 30 14:00:09 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 14:00:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: <2cIUWaQL9GilRFtckC9SpcJVet_0Rb8SmFS1tfe8AWE=.35713c3e-1f5d-45e5-8a3c-d732070d7b81@github.com> Message-ID: <0Hci_F96En1EB9q2DfH40AREGZ13zPQaCmuUi_HUN2U=.7235f9f7-66a4-4125-8916-4ba320c0e2b2@github.com> On Tue, 28 Mar 2023 21:39:21 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/threads.cpp line 1433: >> >>> 1431: >>> 1432: JavaThread* Threads::owning_thread_from_monitor(ThreadsList* t_list, ObjectMonitor* monitor) { >>> 1433: assert(SafepointSynchronize::is_at_safepoint(), "not safe outside of safepoint"); >> >> Shouldn't this be gated on UseFastLocking? > > Hmmm.... `owning_thread_from_monitor()` is only called from > `ObjectSynchronizer::get_lock_owner()` when `get_lock_owner()` > knows that it has an ObjectMonitor in hand. I'm not at all sure that > we can assert that `ObjectSynchronizer::get_lock_owner()` is > only called from a safepoint. There has been a single call path from management.cpp that is not calling this code at a safepoint, and I changed that code to take the safepoint-taking variant code-path when using UseFastLocking. That path already existed and has been used when max_depth == 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153307575 From sspitsyn at openjdk.org Thu Mar 30 14:09:04 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 14:09:04 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again Just wanted to check if we want to bump `@since` version of `OpaqueFrameException` from 19 to 21. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13203#issuecomment-1490367599 From rkennke at openjdk.org Thu Mar 30 14:32:01 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 14:32:01 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v43] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Don't retry on inflate; use aarch64 ZF from fast_lock(); bunch of small changes based on reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/b0ffd9af..e0e019e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=41-42 Stats: 31 lines in 6 files changed: 0 ins; 10 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 30 14:32:03 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 14:32:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: <_b88qICP5AK4NgLpW0fXKgoa8ObWoZM1GvXKpoNMxlU=.85187d4a-968e-4283-b46d-4290ee3cc402@github.com> References: <_b88qICP5AK4NgLpW0fXKgoa8ObWoZM1GvXKpoNMxlU=.85187d4a-968e-4283-b46d-4290ee3cc402@github.com> Message-ID: On Tue, 28 Mar 2023 02:56:47 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensure safepoint when processing lock-stack > > src/hotspot/share/runtime/lockStack.hpp line 43: > >> 41: // efficient addressing in generated code. >> 42: int _offset; >> 43: oop _base[CAPACITY]; > > Should we be using `OopHandle` here rather than raw oops? Would that not avoid issues with scanning the lock-stack only during safepoints? > > Another alternative for a STW safepoint would be to do a handshake with the target threads. I added verification of sane thread states (self, safepointed, handshaked, watermark-processing, suspended) when scanning the lock-stack. OopHandle would impact performance a bit too much for my taste. (If performance is not a prime concern, I would be happy to ditch stack-locking altogether and use monitors-only instead.) Yes it might be useful if management code could handshake threads one-by-one instead of safepointing, but that's not within the scope of this PR, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153312238 From rkennke at openjdk.org Thu Mar 30 14:32:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 14:32:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v30] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 02:59:31 GMT, Dean Long wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensure safepoint when processing lock-stack > > src/hotspot/share/runtime/objectMonitor.inline.hpp line 36: > >> 34: #include "runtime/synchronizer.hpp" >> 35: >> 36: inline intptr_t ObjectMonitor::is_entered(JavaThread* current) const { > > Suggestion: > > inline bool ObjectMonitor::is_entered(JavaThread* current) const { Oh dear, where did that came from? I'm fixing it. > src/hotspot/share/runtime/synchronizer.cpp line 506: > >> 504: return; >> 505: } >> 506: // Otherwise retry. > > Why is retry important for the new code but not the old code? It is not. There's the off-chance that another thread installs a hash-code (with the new locking this would be possible to do without causing inflation), but I guess that would be a very rare clash. I'm changing the code to not retry and directly dive into inflation instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153318137 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153320332 From rkennke at openjdk.org Thu Mar 30 14:34:49 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 14:34:49 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v37] In-Reply-To: <1ZtuIYVxUaqsHQSGuCJB0qDHqfXzVEyidpaEKcyoxIs=.e9a9b540-5e0c-474d-a89b-09e99ea27ddb@github.com> References: <1ZtuIYVxUaqsHQSGuCJB0qDHqfXzVEyidpaEKcyoxIs=.e9a9b540-5e0c-474d-a89b-09e99ea27ddb@github.com> Message-ID: On Wed, 29 Mar 2023 22:09:02 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Bounds check in lock-stack verification; only do watermark if we have one > > src/hotspot/share/runtime/lockStack.cpp line 69: > >> 67: assert(UseFastLocking && !UseHeavyMonitors, "never use lock-stack when fast-locking is disabled"); >> 68: assert((_offset <= end_offset()), "lockstack overflow: _offset %d end_offset %d", _offset, end_offset()); >> 69: assert((_offset >= start_offset()), "lockstack underflow: _offset %d end_offset %d", _offset, start_offset()); > > You should save a local copy of `end_offset()` and a local copy of > `start_offset()` in an `#ifdef ASSERT ... #endif` code block and > then use those local copies in the `assert()` condition and mesg. > That will guard against parallel usage by the target thread versus > the verifying thread. Neither start-offset nor end-offset are changing ever. Those are the hard boundaries of the fixed-sized stack. Ideally both methods would be constexpr, but this is currently not easy because we can't use offsetof(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153358127 From rkennke at openjdk.org Thu Mar 30 14:34:51 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 14:34:51 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v35] In-Reply-To: References: <-xn2Y5lZ-63av1QhbJNDx4saAzgmFT9hE394alHUFHI=.ae58b01f-d6ef-45f3-8e42-fc7b7a552b8e@github.com> Message-ID: On Wed, 29 Mar 2023 22:13:09 GMT, Daniel D. Daugherty wrote: >> Update: In v35 you put back the `!UseFastLocking` check in `jmm_GetThreadInfo()` >> so now the `assert()` won't fire, but you've again changed the behavior of that API >> and now it will be able to observe fewer thread state changes than it did before. > > Please explain why you think this is "not safe". Yes, you can observe state that is in > the process of changing, but do you think that we'll see a crash with allowing > `Threads::owning_thread_from_object()` to be called from a non-safepoint place? I don't think we'd see a crash, but we might get false results when we are scanning the lock-stack of a foreign thread, when that thread does not hold still. I'm not even comfortable doing that cross-stack lock query with the old code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1153354742 From alanb at openjdk.org Thu Mar 30 14:34:22 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 30 Mar 2023 14:34:22 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Thu, 30 Mar 2023 14:05:36 GMT, Serguei Spitsyn wrote: > Just wanted to check if we want to bump `@since` version of `OpaqueFrameException` from 19 to 21. OpaqueFrameException was added as a permanent API in JDK 19 so it doesn't change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13203#issuecomment-1490410758 From sspitsyn at openjdk.org Thu Mar 30 14:39:28 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 14:39:28 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v12] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: minor tweak in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/d05ed921..9969a6c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=10-11 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From rkennke at openjdk.org Thu Mar 30 15:04:07 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 15:04:07 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v44] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 133 commits: - Merge branch 'master' into JDK-8291555-v2 - Several small changes based on reviews - Don't retry on inflate; use aarch64 ZF from fast_lock(); bunch of small changes based on reviews - Splice entry points for InterpreterRuntime::monitorenter() - Use ANONYMOUS_OWNER constant in SA - Address more of @shipilev's review comments - More verification through zapping unused entries, x86 parts - More verification through zapping unused entries; get rid of _thread field - Bounds check in lock-stack verification; only do watermark if we have one - Relax verification in oops_do(), put back UseFastLocking in management.cpp - ... and 123 more: https://git.openjdk.org/jdk/compare/9df20600...815a8209 ------------- Changes: https://git.openjdk.org/jdk/pull/10907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=43 Stats: 2068 lines in 54 files changed: 1279 ins; 88 del; 701 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Thu Mar 30 15:32:16 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 15:32:16 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: <60eMtJqr0OixK0Yv1WCgN-dvX8nkDVZ0suA6GfwN8tQ=.02972477-1bf0-41a6-a243-bea51d6c7413@github.com> On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again This becomes obsolete: `src/hotspot/share/prims/jvmtiTagMap.cpp: // disabled if vritual threads are enabled with --enable-preview` ------------- PR Comment: https://git.openjdk.org/jdk/pull/13203#issuecomment-1490507154 From sspitsyn at openjdk.org Thu Mar 30 15:45:04 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 15:45:04 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13203#pullrequestreview-1365421647 From rkennke at openjdk.org Thu Mar 30 16:07:17 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 16:07:17 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v45] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Allow old monitorenter entry point when using heavy monitors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/815a8209..6514f831 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=43-44 Stats: 56 lines in 16 files changed: 0 ins; 0 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Thu Mar 30 16:11:15 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 30 Mar 2023 16:11:15 GMT Subject: RFR: 8305209: JDWP exit error AGENT_ERROR_INVALID_THREAD(203): missing entry in running thread table Message-ID: The real purpose of this PR is to add virtual thread support to ThreadMemoryLeakTest.java, but this exposed bugs in both the debug agent and in TestScaffold, so those are being fixed also (and the debug agent bug is the CR being used). The debug agent bug is due to a race condition during VM exit. The VM is in the process of shutting down. The debug agent has already disabled JVMTI callbacks and has sent the VMDeathEvent. At this point in time there are also threads exiting that the debug agent knows about, but it will not get a ThreadEndEvent for because of the callbacks being disabled. Thus these threads remain in the debug agent's list of known threads, even though they have exited. The debuggee receives the VMDeathEvent and does a VM.resume(). During the debug agent's handing of the VM.Resume command, it iterates over all known threads and needs to map each to its ThreadNode so it can be resumed, and this mapping requires accessing the JVMTI TLS for the thread. The problem is some of the threads may have exited already, and therefore no longer have TLS. This results in the assert in the debug agent. This debug agent issue was already addressed for platform threads, but not for virtual threads, which is why we s tarted seeing this issue when this test was modified. The fix is to just replicate what is done for platform threads for virtual threads also. The TestScaffold bug is that if the debuggee crashes/asserts, this is likely to go unnoticed, especially if it happens during VM exit (and the test essentially has already completed). Because of this TestScaffold bug, the debug agent bug above did not result in a test failure. After fixing TestScaffold to check the exitCode of the debuggee process, the test started to appropriately fail until the debug agent was fixed. One other thing to point out is the OOME issue I started getting frequently when testing with virtual threads. Since virtual threads are created at a much higher rate than platform threads, their creation started to overwhelm the debugger (actually the JDI implementation). There is already a mechanism in place to do a VM.HoldEvents if JDI has queue up 10,000 events. The problem is that events are coming in so fast that even after doing the VM.HoldEvents, the number of queued events continues to go up for a while, and sometimes reaches 30,000 or more. This raises the peak memory usage of the test quite a bit. Since the test purposely uses a small heap so a memory leak is quickly and reliably detected, the large queue often results in an OOME. Because of this I make virtual threads sleep for 100ms instead of 50ms to slow down their creation, and this resolved the issue. I tested by running all of test/jdk/com/sun/jdi 25 times on each platform with and without virtual thread testing enabled. ------------- Commit messages: - Support virtual thread testing. - Fix issues with missing virtual thread during VM shutdown Changes: https://git.openjdk.org/jdk/pull/13246/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13246&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305209 Stats: 40 lines in 3 files changed: 35 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13246/head:pull/13246 PR: https://git.openjdk.org/jdk/pull/13246 From sspitsyn at openjdk.org Thu Mar 30 16:36:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 16:36:33 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: References: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> Message-ID: On Thu, 30 Mar 2023 01:32:28 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1554: >> >>> 1552: } >>> 1553: // Correct jt->jvmti_thread_state() and jt->jvmti_vthread() if necessary. >>> 1554: // It was not maintained while notifyJvmti was disabled. >> >> While trying to understand which exact situation we are trying to guard against with this code, I run the test without the sleeps and without this restore code and I got a crash when deleting a JvmtiThreadState (null dereference of _thread in the ~()). Probably the same crash you mentioned you had. But when debugging the crash I see that the problem is that the assumption that disabling the flag is done when no virtual threads are running is not guaranteed (see my comment there). So I think we are trying to address a case that shouldn't happen in the first place. Also not sure if applying this restore in all cases will be correct, since we might be somewhere at a transition. For example, a thread could have blocked right in the return from notifyJvmtiUnmount() in yieldContinuation(). It will looked like virtual because unmount() was not executed yet, and the jvmti_thread_state should be that of the platform thread because we never changed it when mounting. We should leave the stat e as is but in here we would change it to the virtual thread's jvmti state. The only case I think it makes sense to do this restore steps when enabling the flag is for those threads that are outside a transition with a mounted virtual thread, since we want to adjust the jvmti_thread_state so that it looks right on the next unmount. >> But in any case this is also only needed when using the WhiteBox methods right? In the intended case (no WhiteBox method used), after we execute this operation to enable the events, we will create the JvmtiThreadStates later in JvmtiExport::get_jvmti_interface() and the correct jvmti_thread_state and jvmti_vthread will be already set for each JavaThread. In that case can we only execute this restore code when using the WhiteBox API? > > I've just pushed my update with refactoring of these corrections in `VM_SetNotifyJvmtiEventsMode` after discussion of this code with Leonid. I hope it resolved at least part of your concerns. > This kind of problem exists only when we disable+enable notifyJvmti events multiple times for testing purposes. > This code with corrections of jt->jvmti_thread_state() is not needed when we enable notifyJvmti events just once. > Your are correct that the ability to disable notifyJvmti events is implemented with using WhiteBox API. > My last update has a comment explaining this. > >> In that case can we only execute this restore code when using the WhiteBox API? > > This is a good suggestion which we already discussed privately with Leonid. > I've got an idea how to implement this check. > Also not sure if applying this restore in all cases will be correct, since we might be somewhere at a transition. For example, a thread could have blocked right in the return from notifyJvmtiUnmount() in yieldContinuation(). It will looked like virtual because unmount() was not executed yet, and the jvmti_thread_state should be that of the platform thread because we never changed it when mounting. We should leave the state as is but in here we would change it to the virtual thread's jvmti state. The only case I think it makes sense to do this restore steps when enabling the flag is for those threads that are outside a transition with a mounted virtual thread, since we want to adjust the jvmti_thread_state so that it looks right on the next unmount. Agreed, thanks. In fact, I've already experimented with it. As you noted, this correction is wrong for unmount transition. Also, we don't need to correct it for mount transition because it will be corrected by the `JvmtiVTMSTransitionDisabler::VTMS_mount_end()` as it makes a call `thread->rebind_to_jvmti_thread_state_of(vt)`. The only thing that bothers me here is that we fight non-real problems as this code is needed only to support artificial disabling for testing purposes to be able to repeat enable again. :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153513204 From sspitsyn at openjdk.org Thu Mar 30 17:10:16 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 17:10:16 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: tweak in count_transitions_and_correct_jvmti_thread_states ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/9969a6c3..1bb250a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=11-12 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Thu Mar 30 17:14:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 17:14:24 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 23 Mar 2023 18:05:47 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comment: remove unneeded function > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 157: > >> 155: >> 156: if (args.length > 0 && args[0].equals("attach")) { // agent loaded into running VM case >> 157: String arg = args.length == 2 ? args[1] : ""; > > I don't see any args being passed in other than "attach"? What might `arg` be set to? Only "attach" can be passed in args. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153556293 From sspitsyn at openjdk.org Thu Mar 30 18:08:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 18:08:18 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Wed, 29 Mar 2023 08:00:36 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: > > Fix ThreadSleepEvent again I looked at Serviceability changes and related tests. They are good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13203#issuecomment-1490713160 From lmesnik at openjdk.org Thu Mar 30 18:38:24 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 30 Mar 2023 18:38:24 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 17:10:16 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: tweak in count_transitions_and_correct_jvmti_thread_states Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13133#pullrequestreview-1365718520 From mgronlun at openjdk.org Thu Mar 30 18:41:25 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 18:41:25 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v11] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: reviewer feedback, loading to agent.cpp, bugfix loading statically linked agent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/741b8686..0b10773f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=09-10 Stats: 1079 lines in 9 files changed: 481 ins; 426 del; 172 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From pchilanomate at openjdk.org Thu Mar 30 18:59:25 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Mar 2023 18:59:25 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v8] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 00:55:59 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1550: >> >>> 1548: >>> 1549: if (jt->is_in_VTMS_transition()) { >>> 1550: count++; >> >> For those threads that are in a transition when we enable the events, shouldn't we also set the jvmti_is_in_VTMS_transition field for the corresponding vthread as we do in JvmtiVTMSTransitionDisabler::start_VTMS_transition()? It seems a JvmtiVTMSTransitionDisabler that targets that particular vthread could otherwise proceed after the safepoint while that vthread is still in the transition. The "all" JvmtiVTMSTransitionDisabler won't proceed because that one does check the _VTMS_transition_count counter. I see that in general we won't have access to the vthread oop though. > > Nice catch, thanks! > Fixed now, but will push it together with other potential updates related to your review comments. The issue I see is that during a transition the vt_oop that we'll get will almost always be that of the carrier thread, because it's almost the first thing we change when unmounting, and the last we change when mounting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153672474 From mgronlun at openjdk.org Thu Mar 30 18:59:30 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 18:59:30 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v12] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' into agents - reviewer feedback, loading to agent.cpp, bugfix loading statically linked agent - more cleanup - handle multiple envs with same VMInit callback - more cleanup - cleanup - fixes - remove implementation details - remove JVMPI - cleanup - ... and 5 more: https://git.openjdk.org/jdk/compare/83cf28f9...659e6f3d ------------- Changes: https://git.openjdk.org/jdk/pull/12923/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=11 Stats: 1889 lines in 22 files changed: 1368 ins; 490 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From pchilanomate at openjdk.org Thu Mar 30 18:59:28 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Mar 2023 18:59:28 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: References: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> Message-ID: <9SIS30RyTgA-KjW_vQ1byD6b-SlVyJWoXD_IaOxk95I=.a5138742-c319-444e-bb3f-5d63d2106bbb@github.com> On Thu, 30 Mar 2023 16:33:01 GMT, Serguei Spitsyn wrote: >> I've just pushed my update with refactoring of these corrections in `VM_SetNotifyJvmtiEventsMode` after discussion of this code with Leonid. I hope it resolved at least part of your concerns. >> This kind of problem exists only when we disable+enable notifyJvmti events multiple times for testing purposes. >> This code with corrections of jt->jvmti_thread_state() is not needed when we enable notifyJvmti events just once. >> Your are correct that the ability to disable notifyJvmti events is implemented with using WhiteBox API. >> My last update has a comment explaining this. >> >>> In that case can we only execute this restore code when using the WhiteBox API? >> >> This is a good suggestion which we already discussed privately with Leonid. >> I've got an idea how to implement this check. > >> Also not sure if applying this restore in all cases will be correct, since we might be somewhere at a transition. For example, a thread could have blocked right in the return from notifyJvmtiUnmount() in yieldContinuation(). It will looked like virtual because unmount() was not executed yet, and the jvmti_thread_state should be that of the platform thread because we never changed it when mounting. We should leave the state as is but in here we would change it to the virtual thread's jvmti state. The only case I think it makes sense to do this restore steps when enabling the flag is for those threads that are outside a transition with a mounted virtual thread, since we want to adjust the jvmti_thread_state so that it looks right on the next unmount. > > Agreed, thanks. In fact, I've already experimented with it. > As you noted, this correction is wrong for unmount transition. Also, we don't need to correct it for mount transition because it will be corrected by the `JvmtiVTMSTransitionDisabler::VTMS_mount_end()` as it makes a call `thread->rebind_to_jvmti_thread_state_of(vt)`. > The only thing that bothers me here is that we fight non-real problems as this code is needed only to support artificial disabling for testing purposes to be able to repeat enable again. :-) The fix to only restore the state outside a transition looks good. Yes, the added testing methods are making this harder :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153670911 From pchilanomate at openjdk.org Thu Mar 30 19:03:24 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Mar 2023 19:03:24 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: Message-ID: <07UH4ks6EGmxIt5mZ3dNPi0YaC8u-xhBNF-Ao9iOAcA=.378b96b5-19e0-4d0a-95d8-83fd44f39024@github.com> On Thu, 30 Mar 2023 17:10:16 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: tweak in count_transitions_and_correct_jvmti_thread_states test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 129: > 127: } > 128: > 129: static synchronized private void startThreads() { So making this method synchronized instead of startThread() will make it less likely that we will face the previous issue, but it is still timing dependent, because the call to start the launcher can return before the launcher reaches here. It will be very unlikely given the sleeps but if we want to guard against any surprises we could have a variable set in startThreads() and in finishThreads() we check and wait until that variable is set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153676258 From mgronlun at openjdk.org Thu Mar 30 19:15:23 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 19:15:23 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v13] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/659e6f3d..c024fac5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From duke at openjdk.org Thu Mar 30 19:20:15 2023 From: duke at openjdk.org (Bernd) Date: Thu, 30 Mar 2023 19:20:15 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v12] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 18:59:30 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into agents > - reviewer feedback, loading to agent.cpp, bugfix loading statically linked agent > - more cleanup > - handle multiple envs with same VMInit callback > - more cleanup > - cleanup > - fixes > - remove implementation details > - remove JVMPI > - cleanup > - ... and 5 more: https://git.openjdk.org/jdk/compare/83cf28f9...659e6f3d Hm, not sure how helpful the events are, but if you really want them to be informative, how about adding the path of the jar file or maybe the class names? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1490802537 From mgronlun at openjdk.org Thu Mar 30 19:20:11 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 19:20:11 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v14] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: restore misssing frees ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/c024fac5..ab74621b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=12-13 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 30 19:23:20 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 19:23:20 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v14] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 19:20:11 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > restore misssing frees > Hm, not sure how helpful the events are, but if you want them to be informative, how about adding the path of the jar file or maybe the class names? It has been considered. The full path of the Java .jar might not be that interesting, and it adds much more complexity to resolve it, as most are relative to the classpath. ClassName will be whatever class exports the prremain or agentmain. We discussed this internally and decided that if there is an actual demand for it, we can add a "path" field to the JavaAgent event type. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1490811900 From mgronlun at openjdk.org Thu Mar 30 19:34:22 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 19:34:22 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 12:26:16 GMT, Markus Gr?nlund wrote: > I've had a good look through now and have a better sense of the refactoring. Seems good. > > I'll wait for any tweaks before hitting the approve button though. > > Thanks Moving the loading logic to the agent.cpp module was a bit harder than I initially thought. It also exposed a bug in how statically linked libraries were loaded - now fixed. Sorry for the large update. Thanks again for having a look. >> src/hotspot/share/prims/agentList.cpp line 542: >> >>> 540: >>> 541: // Invoke the Agent_OnAttach function >>> 542: JavaThread* THREAD = JavaThread::current(); // For exception macros. >> >> Nit: just use `current` rather than `THREAD` and don't use the exception macros. > > Ported as is but good point, will update. Updated - cheers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1490828465 PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1153703484 From mgronlun at openjdk.org Thu Mar 30 19:34:26 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 19:34:26 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v9] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 12:19:50 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/prims/agent.cpp line 41: >> >>> 39: char* copy = AllocateHeap(length + 1, mtInternal); >>> 40: strncpy(copy, str, length + 1); >>> 41: assert(strncmp(copy, str, length + 1) == 0, "invariant"); >> >> Unclear what you are checking here. Don't you trust strncpy? > > Maybe a bit paranoid, yes. I can clean up. updated to use os:::strdup - cheers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1153702643 From mgronlun at openjdk.org Thu Mar 30 19:34:27 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 30 Mar 2023 19:34:27 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: <-OvxuwPKbYU514MyCXdcC5-0Nt1ftlipuUFueCe3DGc=.3b0140b3-ef41-4ad6-9515-ec6e9ef40250@github.com> References: <-OvxuwPKbYU514MyCXdcC5-0Nt1ftlipuUFueCe3DGc=.3b0140b3-ef41-4ad6-9515-ec6e9ef40250@github.com> Message-ID: On Tue, 14 Mar 2023 12:22:16 GMT, Markus Gr?nlund wrote: >> n.b. that also applies for accesses/updates to field _next. > > I wanted all accesses to use the iterator. The only access is given to the iterator and AgentList by way of being friends. No need to expose more. I updated all external access to use getters/setters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1153703241 From alanb at openjdk.org Thu Mar 30 19:53:19 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 30 Mar 2023 19:53:19 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: <60eMtJqr0OixK0Yv1WCgN-dvX8nkDVZ0suA6GfwN8tQ=.02972477-1bf0-41a6-a243-bea51d6c7413@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <60eMtJqr0OixK0Yv1WCgN-dvX8nkDVZ0suA6GfwN8tQ=.02972477-1bf0-41a6-a243-bea51d6c7413@github.com> Message-ID: On Thu, 30 Mar 2023 15:29:43 GMT, Serguei Spitsyn wrote: > This becomes obsolete: > `src/hotspot/share/prims/jvmtiTagMap.cpp: // disabled if vritual threads are enabled with --enable-preview` A left-over from JDK-8285739, thanks, we can remove that comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13203#issuecomment-1490855154 From alanb at openjdk.org Thu Mar 30 19:58:24 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 30 Mar 2023 19:58:24 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v4] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: <1VSMgL-O6sBz4fZGONbHPF_lHILBy8R4bsRrJ0Rgtdg=.d0a651d9-03d1-4e71-9d4f-8778c26f86b1@github.com> On Wed, 29 Mar 2023 21:43:38 GMT, Paul Sandoz wrote: >> Alan Bateman has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix ThreadSleepEvent again > > src/java.base/share/classes/java/lang/Thread.java line 1546: > >> 1544: // bind thread to container >> 1545: if (this.container != null) >> 1546: throw new IllegalThreadStateException(); > > This check is not replicated in `VirtualThread::start`, i think the CAS protects against that. Maybe assert instead in the virtual thread implementation, thereby the comment in `setThreadContainer` can be changed to something like "`this.container` checked/asserted to be != null before call to Virtual/Thread::start(ThreadContainer)" ? Yes, they are different. If adding platform thread to a container fails with OOME then it does so when its threadStatus is 0. This check just previous another attempt to start it. Virtual threads work differently but I can add an assert to make this clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1153726790 From dcubed at openjdk.org Thu Mar 30 19:59:54 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Mar 2023 19:59:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v45] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 16:07:17 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow old monitorenter entry point when using heavy monitors The project is now baselined on jdk-21+17-1334. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1490861748 From dcubed at openjdk.org Thu Mar 30 20:06:54 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Mar 2023 20:06:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v45] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 16:07:17 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow old monitorenter entry point when using heavy monitors I tried to build the latest again on my MBP13 and 'release' now fails to build in a different way: In file included from /System/Volumes/Data/work/shared/bug_hunt/8291555_for_jdk21.git/open/src/hotspot/cpu/x86/assembler_x86.cpp:33: /System/Volumes/Data/work/shared/bug_hunt/8291555_for_jdk21.git/open/src/hotspot/share/runtime/objectMonitor.hpp:161:26: error: constexpr function never produces a constant expression [-Winvalid-constexpr] static constexpr void* anon_owner_ptr() { return reinterpret_cast(ANONYMOUS_OWNER); } ^ /System/Volumes/Data/work/shared/bug_hunt/8291555_for_jdk21.git/open/src/hotspot/share/runtime/objectMonitor.hpp:161:52: note: reinterpret_cast is not allowed in a constant expression static constexpr void* anon_owner_ptr() { return reinterpret_cast(ANONYMOUS_OWNER); } ^ 1 error generated. It looks like GHA is also unhappy. I'll hold off with doing anything with v44 for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1490876294 From rkennke at openjdk.org Thu Mar 30 20:27:55 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 30 Mar 2023 20:27:55 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v46] In-Reply-To: References: Message-ID: <1d7cQrxIUiC86oRLdiq9O2XoZaVUJXK7FCD3MOv9lnQ=.7ed982a6-ece6-407a-9bb0-c5af50f468ee@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix release build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/6514f831..eaf2286f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=44-45 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Thu Mar 30 21:51:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Mar 2023 21:51:21 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: <07UH4ks6EGmxIt5mZ3dNPi0YaC8u-xhBNF-Ao9iOAcA=.378b96b5-19e0-4d0a-95d8-83fd44f39024@github.com> References: <07UH4ks6EGmxIt5mZ3dNPi0YaC8u-xhBNF-Ao9iOAcA=.378b96b5-19e0-4d0a-95d8-83fd44f39024@github.com> Message-ID: On Thu, 30 Mar 2023 19:00:03 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: tweak in count_transitions_and_correct_jvmti_thread_states > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 129: > >> 127: } >> 128: >> 129: static synchronized private void startThreads() { > > So making this method synchronized instead of startThread() will make it less likely that we will face the previous issue, but it is still timing dependent, because the call to start the launcher can return before the launcher reaches here. It will be very unlikely given the sleeps but if we want to guard against any surprises we could have a variable set in startThreads() and in finishThreads() we check and wait until that variable is set. Thank you for the concern. The `startThread()` below which is called from `startThreads()` has the call to `thread.ensureReady()` which waits until the target tested thread really starts and sets the `threadReady` field. So, there is no race condition as the `startThreads()` and `finishThreads()` are synchronized methods. static private void startThread(int i) { String name = "TestedThread" + i; TestedThread thread = new TestedThread(name); vts[i] = Thread.ofVirtual().name(name).start(thread); thread.ensureReady(); threads[i] = thread; log("# Java: started vthread: " + name); } static synchronized private void startThreads() { log("\n# Java: Starting vthreads"); for (int i = 0; i < VTHREADS_CNT; i++) { sleep(1); startThread(i); } } static private synchronized void finishThreads() { try { for (int i = 0; i < VTHREADS_CNT; i++) { TestedThread thread = threads[i]; thread.letFinish(); vts[i].join(); } } catch (InterruptedException e) { throw new RuntimeException(e); } } Please, let me know if I'm missing anything. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153820099 From dcubed at openjdk.org Thu Mar 30 22:03:52 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Mar 2023 22:03:52 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v46] In-Reply-To: <1d7cQrxIUiC86oRLdiq9O2XoZaVUJXK7FCD3MOv9lnQ=.7ed982a6-ece6-407a-9bb0-c5af50f468ee@github.com> References: <1d7cQrxIUiC86oRLdiq9O2XoZaVUJXK7FCD3MOv9lnQ=.7ed982a6-ece6-407a-9bb0-c5af50f468ee@github.com> Message-ID: On Thu, 30 Mar 2023 20:27:55 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix release build I'm able to build release, fastdebug and slowdebug on my MBP13 with v45. I submitted v45 to Mach5 and the windows-x64 build is failing: [2023-03-30T21:53:52,394Z] c:\sb\prod\1680212926\workspace\open\src\hotspot\share\runtime/lockStack.inline.hpp(37): error C2220: the following warning is treated as an error [2023-03-30T21:53:52,395Z] c:\sb\prod\1680212926\workspace\open\src\hotspot\share\runtime/lockStack.inline.hpp(37): warning C4267: 'return': conversion from 'size_t' to 'int', possible loss of data [2023-03-30T21:53:52,397Z] lib/CompileJvm.gmk:145: recipe for target '/cygdrive/c/sb/prod/1680212926/workspace/build/windows-x64/hotspot/variant-server/libjvm/objs/deoptimization.obj' failed [2023-03-30T21:53:52,397Z] make[3]: *** [/cygdrive/c/sb/prod/1680212926/workspace/build/windows-x64/hotspot/variant-server/libjvm/objs/deoptimization.obj] Error 1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1491019033 From cjplummer at openjdk.org Thu Mar 30 22:11:25 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 30 Mar 2023 22:11:25 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v5] In-Reply-To: References: <27JLa60WeywSMcJj-6KfaQD8RBnwBbAvcc0gecc-3h4=.a2a25b70-9e90-4551-af90-35aed3d57b59@github.com> Message-ID: On Thu, 30 Mar 2023 17:11:33 GMT, Serguei Spitsyn wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 157: >> >>> 155: >>> 156: if (args.length > 0 && args[0].equals("attach")) { // agent loaded into running VM case >>> 157: String arg = args.length == 2 ? args[1] : ""; >> >> I don't see any args being passed in other than "attach"? What might `arg` be set to? > > Only "attach" can be passed in args. So if args[0] is "attach", then arg will always be set to "attach", which means it could be passed in as a literal to loadAgentLibrary() below. But even that doesn't seem to be necessary since the agent doesn't seem to look at the args passed in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153836810 From cjplummer at openjdk.org Thu Mar 30 22:20:23 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 30 Mar 2023 22:20:23 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 17:10:16 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: tweak in count_transitions_and_correct_jvmti_thread_states test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 90: > 88: * - disable notifyJvmti events mode > 89: * - start the platform launcher thread which starts N of virtual thread > 90: * - enable notifyJvmti events mode after about hapf of virtual thread started Suggestion: * - enable notifyJvmti events mode after about half of the virtual threads have started ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153843284 From cjplummer at openjdk.org Thu Mar 30 22:34:28 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 30 Mar 2023 22:34:28 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 17:10:16 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: tweak in count_transitions_and_correct_jvmti_thread_states test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 45: > 43: // to have sleep() calls to provide yielding as some frequency of virtual > 44: // thread mount state transitions is needed for this test scenario. > 45: class TestedThread extends Thread { Shouldn't this be a Runnable instead of a Thread? I would also suggest not using the term "thread" here. Maybe "task"? Otherwise code like the following is misleading: TestedThread thread = threads[i]; thread.letFinish(); There are no threads being referenced in this code, yet the term "thread" is used 4 times. test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 170: > 168: // Disable notifyJvmti events mode at test cycle start. > 169: // It is unsafe to do so if any virtual threads are executed. > 170: setVirtualThreadsNotifyJvmtiMode(iter, false); What happens if the main thread is a virtual thread because the virtual thread wrapper is being used? Is there any concern with disabling notifications in this case? test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 172: > 170: setVirtualThreadsNotifyJvmtiMode(iter, false); > 171: > 172: Thread tt = Thread.ofPlatform().name("StartThreadsTest").start(ToggleNotifyJvmtiTest::startThreads); Why does each test cycle need to be run in a separate platform thread? Can't you just use the main test thread? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153845337 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153850702 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153849125 From pchilanomate at openjdk.org Thu Mar 30 23:14:24 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Mar 2023 23:14:24 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: <07UH4ks6EGmxIt5mZ3dNPi0YaC8u-xhBNF-Ao9iOAcA=.378b96b5-19e0-4d0a-95d8-83fd44f39024@github.com> Message-ID: On Thu, 30 Mar 2023 21:48:22 GMT, Serguei Spitsyn wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 129: >> >>> 127: } >>> 128: >>> 129: static synchronized private void startThreads() { >> >> So making this method synchronized instead of startThread() will make it less likely that we will face the previous issue, but it is still timing dependent, because the call to start the launcher can return before the launcher reaches here. It will be very unlikely given the sleeps but if we want to guard against any surprises we could have a variable set in startThreads() and in finishThreads() we check and wait until that variable is set. > > Thank you for the concern. > The `startThread()` below which is called from `startThreads()` has the call to `thread.ensureReady()` which waits until the target tested thread really starts and sets the `threadReady` field. So, there is no race condition as the `startThreads()` and `finishThreads()` are synchronized methods. > > > static private void startThread(int i) { > String name = "TestedThread" + i; > TestedThread thread = new TestedThread(name); > vts[i] = Thread.ofVirtual().name(name).start(thread); > thread.ensureReady(); > threads[i] = thread; > log("# Java: started vthread: " + name); > } > > static synchronized private void startThreads() { > log("\n# Java: Starting vthreads"); > for (int i = 0; i < VTHREADS_CNT; i++) { > sleep(1); > startThread(i); > } > } > > static private synchronized void finishThreads() { > try { > for (int i = 0; i < VTHREADS_CNT; i++) { > TestedThread thread = threads[i]; > thread.letFinish(); > vts[i].join(); > } > } catch (InterruptedException e) { > throw new RuntimeException(e); > } > } > > Please, let me know if I'm missing anything. So the race I am talking about is between the main thread running finishThreads() and the launcher thread running startThreads(). The main thread could execute finishThreads() before the launcher executes startThreads(). If you comment out the two first sleeps in run_test_cycle() you can actually see the issue. Again, given that the sleeps are there it is an unlikely scheduling, but if we want to avoid depending on timing we can add that extra synchronization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153872917 From jpai at openjdk.org Fri Mar 31 01:42:24 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 31 Mar 2023 01:42:24 GMT Subject: RFR: 8304988: unnecessary dash in @param gives double-dash in docs In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 05:51:57 GMT, Jaikiran Pai wrote: > Can I please get a review of this trivial doc only change which addresses https://bugs.openjdk.org/browse/JDK-8304988? > > I've run `make docs-image` after this change and the generated javadoc for this class looks fine. Since this is `serviceability` area, I'm looking for another review, from a Reviewer, for this trivial change. Anyone? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13239#issuecomment-1491171397 From sspitsyn at openjdk.org Fri Mar 31 01:58:28 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 Mar 2023 01:58:28 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 22:17:01 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: tweak in count_transitions_and_correct_jvmti_thread_states > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 90: > >> 88: * - disable notifyJvmti events mode >> 89: * - start the platform launcher thread which starts N of virtual thread >> 90: * - enable notifyJvmti events mode after about hapf of virtual thread started > > Suggestion: > > * - enable notifyJvmti events mode after about half of the virtual threads have started Fixed now, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1153938774 From cjplummer at openjdk.org Fri Mar 31 04:05:13 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 31 Mar 2023 04:05:13 GMT Subject: RFR: 8304988: unnecessary dash in @param gives double-dash in docs In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 05:51:57 GMT, Jaikiran Pai wrote: > Can I please get a review of this trivial doc only change which addresses https://bugs.openjdk.org/browse/JDK-8304988? > > I've run `make docs-image` after this change and the generated javadoc for this class looks fine. Looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13239#pullrequestreview-1366218465 From dholmes at openjdk.org Fri Mar 31 04:56:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 31 Mar 2023 04:56:22 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v14] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 19:20:11 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > restore misssing frees src/hotspot/share/jfr/metadata/metadata.xml line 1190: > 1188: > 1189: > 1190: Nit: In the last sentence I suggest s/the duration/it is the duration/ src/hotspot/share/prims/agent.cpp line 533: > 531: if (thread->is_pending_jni_exception_check()) { > 532: thread->clear_pending_jni_exception_check(); > 533: } Unsure why we pretend the agent checked this - don't we want -Xcheck:jni to report a bug in the agent? src/hotspot/share/prims/agent.cpp line 536: > 534: } > 535: > 536: // Agent_OnAttach may have used JNI Copied from above? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1153965163 PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1153967066 PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1153966809 From jpai at openjdk.org Fri Mar 31 05:03:24 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 31 Mar 2023 05:03:24 GMT Subject: RFR: 8304988: unnecessary dash in @param gives double-dash in docs In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 05:51:57 GMT, Jaikiran Pai wrote: > Can I please get a review of this trivial doc only change which addresses https://bugs.openjdk.org/browse/JDK-8304988? > > I've run `make docs-image` after this change and the generated javadoc for this class looks fine. Thank you Alan and Chris for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13239#issuecomment-1491289873 From jpai at openjdk.org Fri Mar 31 05:03:25 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 31 Mar 2023 05:03:25 GMT Subject: Integrated: 8304988: unnecessary dash in @param gives double-dash in docs In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 05:51:57 GMT, Jaikiran Pai wrote: > Can I please get a review of this trivial doc only change which addresses https://bugs.openjdk.org/browse/JDK-8304988? > > I've run `make docs-image` after this change and the generated javadoc for this class looks fine. This pull request has now been integrated. Changeset: 787832a5 Author: Jaikiran Pai URL: https://git.openjdk.org/jdk/commit/787832a58677205c9a11ae100dd8a2fbddb30a4a Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod 8304988: unnecessary dash in @param gives double-dash in docs Reviewed-by: alanb, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/13239 From sspitsyn at openjdk.org Fri Mar 31 05:20:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 Mar 2023 05:20:21 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 22:21:06 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: tweak in count_transitions_and_correct_jvmti_thread_states > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 45: > >> 43: // to have sleep() calls to provide yielding as some frequency of virtual >> 44: // thread mount state transitions is needed for this test scenario. >> 45: class TestedThread extends Thread { > > Shouldn't this be a Runnable instead of a Thread? I would also suggest not using the term "thread" here. Maybe "task"? Otherwise code like the following is misleading: > > > TestedThread thread = threads[i]; > thread.letFinish(); > > There are no threads being referenced in this code, yet the term "thread" is used 4 times. Thank you for the suggestion. I've renamed/rearranged it now. Let me know if it is not what you expected. > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 172: > >> 170: setVirtualThreadsNotifyJvmtiMode(iter, false); >> 171: >> 172: Thread tt = Thread.ofPlatform().name("StartThreadsTest").start(ToggleNotifyJvmtiTest::startThreads); > > Why does each test cycle need to be run in a separate platform thread? Can't you just use the main test thread? You are right, hanks. I've made the suggested change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1154022482 PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1154023063 From sspitsyn at openjdk.org Fri Mar 31 05:20:23 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 Mar 2023 05:20:23 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: <07UH4ks6EGmxIt5mZ3dNPi0YaC8u-xhBNF-Ao9iOAcA=.378b96b5-19e0-4d0a-95d8-83fd44f39024@github.com> Message-ID: On Thu, 30 Mar 2023 23:11:09 GMT, Patricio Chilano Mateo wrote: >> Thank you for the concern. >> The `startThread()` below which is called from `startThreads()` has the call to `thread.ensureReady()` which waits until the target tested thread really starts and sets the `threadReady` field. So, there is no race condition as the `startThreads()` and `finishThreads()` are synchronized methods. >> >> >> static private void startThread(int i) { >> String name = "TestedThread" + i; >> TestedThread thread = new TestedThread(name); >> vts[i] = Thread.ofVirtual().name(name).start(thread); >> thread.ensureReady(); >> threads[i] = thread; >> log("# Java: started vthread: " + name); >> } >> >> static synchronized private void startThreads() { >> log("\n# Java: Starting vthreads"); >> for (int i = 0; i < VTHREADS_CNT; i++) { >> sleep(1); >> startThread(i); >> } >> } >> >> static private synchronized void finishThreads() { >> try { >> for (int i = 0; i < VTHREADS_CNT; i++) { >> TestedThread thread = threads[i]; >> thread.letFinish(); >> vts[i].join(); >> } >> } catch (InterruptedException e) { >> throw new RuntimeException(e); >> } >> } >> >> Please, let me know if I'm missing anything. > > So the race I am talking about is between the main thread running finishThreads() and the launcher thread running startThreads(). The main thread could execute finishThreads() before the launcher executes startThreads(). If you comment out the two first sleeps in run_test_cycle() you can actually see the issue. Again, given that the sleeps are there it is an unlikely scheduling, but if we want to avoid depending on timing we can add that extra synchronization. Sorry, I understood you incorrectly. You are right, there is this kind of race here. I've rearranges this area a little bit, and hope, it is cleaner now. Now, both `startVirtualThreads()` and `finishVirtualThreads()` are invoked on the main thread, so they do not need to be synchronized any more. Also, the call to `ensureReady()` are moved to `finishVirtualThreads()` right before the call to `letFinish()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1154021018 From sspitsyn at openjdk.org Fri Mar 31 05:41:23 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 Mar 2023 05:41:23 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 22:31:02 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: tweak in count_transitions_and_correct_jvmti_thread_states > > test/hotspot/jtreg/serviceability/jvmti/vthread/ToggleNotifyJvmtiTest/ToggleNotifyJvmtiTest.java line 170: > >> 168: // Disable notifyJvmti events mode at test cycle start. >> 169: // It is unsafe to do so if any virtual threads are executed. >> 170: setVirtualThreadsNotifyJvmtiMode(iter, false); > > What happens if the main thread is a virtual thread because the virtual thread wrapper is being used? Is there any concern with disabling notifications in this case? I've tested it and it is kind of working now. But I'd like to minimize risk now and avoid testing something which is not really needed. So, this is why the testing cycles are still there. The `serviceability/jvmti/vthread` tests are not supposed to be run with the virtual thread wrapper. It is why we have the special problem list `ProblemList-vthread.txt` in Loom repository. I guess, we should replicate the same or alike approach in the main line when virtual thread wrapper is introduced. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1154034742 From rkennke at openjdk.org Fri Mar 31 06:06:47 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 31 Mar 2023 06:06:47 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v47] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use int instead of size_t for cached offsets, to match the uncached offset type and avoid build failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/eaf2286f..adeccaae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=45-46 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From jwaters at openjdk.org Fri Mar 31 06:16:17 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 31 Mar 2023 06:16:17 GMT Subject: RFR: 8305341: Alignment outside of HotSpot should be enforced by alignas instead of compiler specific attributes Message-ID: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> C11 has been stable for a long time on all platforms, so native code can use the standard alignas operator for alignment requirements ------------- Commit messages: - - GSSLibStub.c - ArrayReferenceImpl.c - Alignment outside of HotSpot should be enforced by alignas instead of compiler specific attributes Changes: https://git.openjdk.org/jdk/pull/13258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13258&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305341 Stats: 12 lines in 3 files changed: 3 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/13258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13258/head:pull/13258 PR: https://git.openjdk.org/jdk/pull/13258 From sspitsyn at openjdk.org Fri Mar 31 06:52:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 Mar 2023 06:52:18 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v14] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: addressed next round of review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/1bb250a7..aef87273 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=12-13 Stats: 34 lines in 1 file changed: 3 ins; 7 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From stuefe at openjdk.org Fri Mar 31 07:28:49 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 31 Mar 2023 07:28:49 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v47] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 06:06:47 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Use int instead of size_t for cached offsets, to match the uncached offset type and avoid build failures src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234: > 6232: orr(hdr, hdr, markWord::unlocked_value); > 6233: // Clear lock-bits, into t2 > 6234: eor(t2, hdr, markWord::unlocked_value); In arm, I use a combination of bic and oor instead. That gives me, with just two instructions, added safety against someone handing in a "11" marked MW. I know, should never happen, but better safe. ldr(new_hdr, Address(obj, oopDesc::mark_offset_in_bytes())); bic(new_hdr, new_hdr, markWord::lock_mask_in_place); // new header (00) orr(old_hdr, new_hdr, markWord::unlocked_value); // old header (01) (note that I moved MW loading down into MA::fast_lock for unrelated reasons). Unfortunately, on aarch64 there seem to be no bic variants that accept immediates. So it would take one more instruction to get the same result: - // Load (object->mark() | 1) into hdr - orr(hdr, hdr, markWord::unlocked_value); - // Clear lock-bits, into t2 - eor(t2, hdr, markWord::unlocked_value); + // Prepare new and old header + mov(t2, markWord::lock_mask_in_place); + bic(t2, hdr, t2); + orr(hdr, t2, markWord::unlocked_value); But maybe there is a better way that does not need three instructions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1154111182 From kevinw at openjdk.org Fri Mar 31 08:54:00 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 31 Mar 2023 08:54:00 GMT Subject: RFR: 8305237: CompilerDirectives DCmds permissions correction Message-ID: The Permissions in DCmds relate to remote usage over JMX. "monitor" is generally for reading information, and "control" is generally for making changes. The DCmds for changing compiler directives should have "control" as the required permission. Tests in test/hotspot/jtreg/serviceability/dcmd/compiler and test/hotspot/jtreg/compiler/compilercontrol still pass with this change. ------------- Commit messages: - 8305237: CompilerDirectives DCmds permissions correction Changes: https://git.openjdk.org/jdk/pull/13262/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13262&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305237 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13262/head:pull/13262 PR: https://git.openjdk.org/jdk/pull/13262 From kevinw at openjdk.org Fri Mar 31 08:54:02 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 31 Mar 2023 08:54:02 GMT Subject: RFR: 8305237: CompilerDirectives DCmds permissions correction In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 08:24:19 GMT, Kevin Walls wrote: > The Permissions in DCmds relate to remote usage over JMX. > "monitor" is generally for reading information, and "control" is generally for making changes. > The DCmds for changing compiler directives should have "control" as the required permission. > > Tests in test/hotspot/jtreg/serviceability/dcmd/compiler and test/hotspot/jtreg/compiler/compilercontrol still pass with this change. This has a lot of labels for a trivial change in a very niche feature, but they all seem relevant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13262#issuecomment-1491551796 From alanb at openjdk.org Fri Mar 31 10:31:31 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 31 Mar 2023 10:31:31 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v5] In-Reply-To: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: > JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. > > There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. > > In addition, there are a small number of implementation changes to sync up from the loom fibers branch: > > - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. > - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. > - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. > - New system property to print a stack trace when a virtual thread sets its own value of a TL. > - ThreadPerTaskExecutor is changed to use FutureTask. > > Testing: tier1-6. Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Expand tests for jdk.ThreadSleep event - Review feedback - Merge - Fix ThreadSleepEvent again - Test updates - ThreadSleepEvent refactoring - Merge - Merge - Initial sync from fibers branch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13203/files - new: https://git.openjdk.org/jdk/pull/13203/files/bfd2c816..722d5afa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=03-04 Stats: 4799 lines in 134 files changed: 3144 ins; 1060 del; 595 mod Patch: https://git.openjdk.org/jdk/pull/13203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13203/head:pull/13203 PR: https://git.openjdk.org/jdk/pull/13203 From mgronlun at openjdk.org Fri Mar 31 11:18:23 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 31 Mar 2023 11:18:23 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v15] In-Reply-To: References: Message-ID: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initializationTime = 12:31:15.574 (2023-03-08) > initializationDuration = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initializationTime = 12:31:31.037 (2023-03-08) > initializationDuration = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initializationTime = 12:31:36.142 (2023-03-08) > initializationDuration = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/ab74621b..07407a82 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=13-14 Stats: 5 lines in 2 files changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Fri Mar 31 11:18:25 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 31 Mar 2023 11:18:25 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v14] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 03:05:31 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> restore misssing frees > > src/hotspot/share/prims/agent.cpp line 533: > >> 531: if (thread->is_pending_jni_exception_check()) { >> 532: thread->clear_pending_jni_exception_check(); >> 533: } > > Unsure why we pretend the agent checked this - don't we want -Xcheck:jni to report a bug in the agent? Good question - I don't know. For dynamically loaded agents, there seems to be quite a lot of handling to return a JNI_OK, even though the agent failed to load or returned failure from the Agent_OnAttach. e.g. // Agent_OnAttach executed so completion status is JNI_OK return JNI_OK; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1154346856 From rkennke at openjdk.org Fri Mar 31 13:54:47 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 31 Mar 2023 13:54:47 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v48] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Check underflow, top-of-stack and mark-bits for sanity, in fast_unlock() (aarch64) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/adeccaae..0e366206 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=46-47 Stats: 39 lines in 4 files changed: 36 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Fri Mar 31 14:03:49 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 31 Mar 2023 14:03:49 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v16] In-Reply-To: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> References: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> Message-ID: On Tue, 28 Mar 2023 19:50:36 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC port was provided by @reinrich, RISCV was provided by @DingliZhang and @zifeihan, and S390x by @offamitkumar. >> >> This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > s390x NULL to nullptr This obviously breaks arm, since its implementation is missing. I opened https://bugs.openjdk.org/browse/JDK-8305387 to track this. This is unfortunate since it holds work on arm in other areas, in my case for https://github.com/openjdk/jdk/pull/10907. > This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x I wonder about the explicit exclusion of arm. Every other CPU seems to be taken care of, even those Oracle does not maintain. Just curious, was there a special reason for excluding arm? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12778#issuecomment-1491971108 From dnsimon at openjdk.org Fri Mar 31 14:31:48 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 31 Mar 2023 14:31:48 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v16] In-Reply-To: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> References: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> Message-ID: On Tue, 28 Mar 2023 19:50:36 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC port was provided by @reinrich, RISCV was provided by @DingliZhang and @zifeihan, and S390x by @offamitkumar. >> >> This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > s390x NULL to nullptr It has also broken GraalVM Native Image. I'll open a JBS issue with a reproducer soon but here's hs-err from a slowdebug JDK build showing the problem: [hs_err_pid30379.log](https://github.com/openjdk/jdk/files/11122818/hs_err_pid30379.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/12778#issuecomment-1492011186 From stuefe at openjdk.org Fri Mar 31 14:41:52 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 31 Mar 2023 14:41:52 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v48] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 13:54:47 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Check underflow, top-of-stack and mark-bits for sanity, in fast_unlock() (aarch64) Here the ARM port. Applies cleanly atop of 0e3662066ead1bc8883fc0c32dce9f795ecd7c9d. https://github.com/tstuefe/jdk/tree/ARM-port-8291555 https://github.com/openjdk/jdk/commit/4c6f5abec58d6a10d3f1eb010f10d00c3cabb43a Differences to other ports: - I don't have as many registers (only 15). Therefore I almost always need to push at least one. My variants of MA::fast_(un)lock take care of this - you pass in a save mask to tell it which temps to save, also handy for error analysis. Multi-pushes are not combined though, but I don't think there is no need to make this more elaborate). - I moved the MW loads down into MA::fast_(un)lock. No need to do this at the caller site, nothing saved by that. - In MA::fast_lock, I use a combination of `bic` and `orr` to prepare new and old headers. The advantage is that it gives me added safety against accidentally passing in "11" marked MWs, at no added cost. - In MA::fast_unlock, I also use `bic`, since it allows denser opcode. I cannot express easily ~lockmask immediate. - In MA::fast_(un)lock, I poison supposed temp regs in debug. I leave them alone in release. Performance: With C2, a lot better than the old solution - about 30%. The reason is a bug in the old solution that causes the JVM to enter the slow path for every monitorexit. Should fix this separately. I tested the ARM port based on an older version of your patch; however, unfortunately ARM is broken due to an unrelated problem (https://github.com/openjdk/jdk/pull/12778), therefore I cannot test it against your new patch variant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1492026658 From matsaave at openjdk.org Fri Mar 31 15:36:59 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 31 Mar 2023 15:36:59 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v16] In-Reply-To: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> References: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> Message-ID: On Tue, 28 Mar 2023 19:50:36 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC port was provided by @reinrich, RISCV was provided by @DingliZhang and @zifeihan, and S390x by @offamitkumar. >> >> This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > s390x NULL to nullptr > This obviously breaks arm, since its implementation is missing. I opened https://bugs.openjdk.org/browse/JDK-8305387 to track this. This is unfortunate since it holds work on arm in other areas, in my case for #10907. > > > This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x > > I wonder about the explicit exclusion of arm. Every other CPU seems to be taken care of, even those Oracle does not maintain. Just curious, was there a special reason for excluding arm? There is no special reason ARM32 was excluded other than the fact no porter has picked it up yet. Fortunately I was able to get in contact with porters for the other platforms, but nobody took on the ARM port until now. Thank you for opening the issue! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12778#issuecomment-1492144686 From stuefe at openjdk.org Fri Mar 31 16:01:01 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 31 Mar 2023 16:01:01 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v48] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 13:54:47 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Check underflow, top-of-stack and mark-bits for sanity, in fast_unlock() (aarch64) src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6264: > 6262: ldrw(t1, Address(rthread, JavaThread::lock_stack_top_offset())); > 6263: cmpw(t1, (unsigned)LockStack::start_offset()); > 6264: br(Assembler::GT, stack_ok); I had to think hard about "GT" here. We could have entered with the thread holding just one inflated lock, then LockStack would be empty but the monitorexit would still be valid. You now do check in the callers for markWord::monitor_value. But the lock could have been inflated concurrently after the caller checks and before this point. But then the LockStack would not have changed, since it represents what the current thread *thinks* are thin locks, not what are actually thin locks? In other words, LockStack is only modified by its owning thread, never from the outside. So this *should* be correct, but its certainly a brain teaser. Maybe add a comment? E.g. "These checks rely on the fact that LockStack is only ever modified by its owning stack, even if the lock got inflated concurrently; removal of LockStack entries after inflation will happen delayed in that case" or somesuch. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6274: > 6272: ldr(t1, Address(rthread, t1)); > 6273: cmpoop(t1, obj); > 6274: br(Assembler::EQ, tos_ok); STOP missing? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6280: > 6278: // Check that hdr is fast-locked. > 6279: Label hdr_ok; > 6280: ands(zr, hdr, markWord::lock_mask_in_place); Confused about ANDS here. So ZR would receive the result of the AND, but I assume zr is immutable and the result is discarded? If so, why not just use TST(hdr, markWord::lock_mask_in_place); ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1154611972 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1154620174 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1154647227 From stuefe at openjdk.org Fri Mar 31 16:01:02 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 31 Mar 2023 16:01:02 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v48] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 15:24:07 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Check underflow, top-of-stack and mark-bits for sanity, in fast_unlock() (aarch64) > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6264: > >> 6262: ldrw(t1, Address(rthread, JavaThread::lock_stack_top_offset())); >> 6263: cmpw(t1, (unsigned)LockStack::start_offset()); >> 6264: br(Assembler::GT, stack_ok); > > I had to think hard about "GT" here. > > We could have entered with the thread holding just one inflated lock, then LockStack would be empty but the monitorexit would still be valid. You now do check in the callers for markWord::monitor_value. But the lock could have been inflated concurrently after the caller checks and before this point. > > But then the LockStack would not have changed, since it represents what the current thread *thinks* are thin locks, not what are actually thin locks? In other words, LockStack is only modified by its owning thread, never from the outside. > > So this *should* be correct, but its certainly a brain teaser. Maybe add a comment? > > E.g. "These checks rely on the fact that LockStack is only ever modified by its owning stack, even if the lock got inflated concurrently; removal of LockStack entries after inflation will happen delayed in that case" or somesuch. This also mandates that fast_lock can only ever entered if the current thread thinks that the lock in question is a thin lock. So the caller checks for markWord::monitor_value are mandatory now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1154619603 From cjplummer at openjdk.org Fri Mar 31 19:37:31 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 31 Mar 2023 19:37:31 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v14] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 06:52:18 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: addressed next round of review suggestions Changes look good, but for the most part I just looked at the test related changes. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13133#pullrequestreview-1367488100 From rkennke at openjdk.org Fri Mar 31 19:39:03 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 31 Mar 2023 19:39:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v49] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Thomas' comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/0e366206..1ad95851 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=47-48 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Fri Mar 31 19:44:13 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 31 Mar 2023 19:44:13 GMT Subject: RFR: 8305237: CompilerDirectives DCmds permissions correction In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 08:24:19 GMT, Kevin Walls wrote: > The Permissions in DCmds relate to remote usage over JMX. > "monitor" is generally for reading information, and "control" is generally for making changes. > The DCmds for changing compiler directives should have "control" as the required permission. > > Tests in test/hotspot/jtreg/serviceability/dcmd/compiler and test/hotspot/jtreg/compiler/compilercontrol still pass with this change. I assume this means we have no tests that try to change these compiler directives. Should we? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13262#issuecomment-1492504793 From cjplummer at openjdk.org Fri Mar 31 19:57:18 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 31 Mar 2023 19:57:18 GMT Subject: RFR: 8305341: Alignment outside of HotSpot should be enforced by alignas instead of compiler specific attributes In-Reply-To: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> References: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> Message-ID: On Fri, 31 Mar 2023 06:07:39 GMT, Julian Waters wrote: > C11 has been stable for a long time on all platforms, so native code can use the standard alignas operator for alignment requirements I don't have any comments on this change in general (it's not something I've dealt with in the past), but I did notice that there are a couple of places you missed: src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:119:#define ATTRIBUTE_ALIGNED(x) __declspec(align(x)) src/java.desktop/share/native/libfreetype/include/freetype/internal/ftvalid.h:82: /* __declspec(align())' in order to compile cleanly with */ src/java.desktop/share/native/libfreetype/src/smooth/ftgrays.c:484: /* __declspec(align())' in order to compile cleanly with */ For the 2nd and 3rd ones you would want to remove all of the following: #if defined( _MSC_VER ) /* Visual C++ (and Intel C++) */ /* We disable the warning `structure was padded due to */ /* __declspec(align())' in order to compile cleanly with */ /* the maximum level of warnings. */ #pragma warning( push ) #pragma warning( disable : 4324 ) #endif /* _MSC_VER */ ... #if defined( _MSC_VER ) #pragma warning( pop ) #endif ------------- PR Comment: https://git.openjdk.org/jdk/pull/13258#issuecomment-1492522828 From dcubed at openjdk.org Fri Mar 31 20:23:50 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 31 Mar 2023 20:23:50 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v49] In-Reply-To: References: Message-ID: <5VKEOAqoXvtmTDHHUYT1CkVP36je-lt_O9i1vC2zNyg=.c57cb302-e730-422b-a8da-e224ad4bc010@github.com> On Fri, 31 Mar 2023 19:39:03 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Thomas' comments v47 passed Mach5 Tier1. I'll continue with additional Mach5 testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1492567631 From sspitsyn at openjdk.org Fri Mar 31 20:32:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 Mar 2023 20:32:24 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v14] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 06:52:18 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: addressed next round of review suggestions Leonid and Chris, thank you for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1492575146 From sspitsyn at openjdk.org Fri Mar 31 20:39:17 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 Mar 2023 20:39:17 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v15] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor simplification in ToggleNotifyJvmtiTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/aef87273..c55b6b38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=13-14 Stats: 13 lines in 1 file changed: 3 ins; 7 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From dcubed at openjdk.org Fri Mar 31 20:56:53 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 31 Mar 2023 20:56:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v49] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 19:39:03 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Thomas' comments As expected, jtreg/vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock005/TestDescription.java continues to pass in release, fastdebug and slowdebug bits on my MBP13. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1492595034 From jlu at openjdk.org Fri Mar 31 21:41:17 2023 From: jlu at openjdk.org (Justin Lu) Date: Fri, 31 Mar 2023 21:41:17 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v5] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Fri, 17 Mar 2023 22:27:48 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Close streams when finished loading into props Something thing to consider is that Intellj defaults .properties files to ISO 8859-1. https://www.jetbrains.com/help/idea/properties-files.html#encoding So users of Intellj / (other IDEs that default to ISO 8859-1 for .properties files) will need to change the default encoding to utf-8 for such files. Or ideally, the respective IDEs can change their default encoding for .properties files if this change is integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12726#issuecomment-1492640306 From dcubed at openjdk.org Fri Mar 31 22:02:51 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 31 Mar 2023 22:02:51 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v49] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 19:39:03 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Thomas' comments v47 is hitting an assertion failure in my Mach5 Tier2 and Tier3 testing: # Internal Error (/opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S30407/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b94a7623-5f98-46f3-8e2c-08444e95afa4/runs/a5754c45-3d7a-46fa-ba4b-c52efcf6ca3b/workspace/open/src/hotspot/share/runtime/lockStack.cpp:78), pid=1731612, tid=1731617 # assert((_top < end_offset())) failed: lockstack overflow: _top 1704 end_offset 1704 # # JRE version: Java(TM) SE Runtime Environment (21.0) (fastdebug build 21-internal-LTS-2023-03-31-1908037.daniel.daugherty.8291555forjdk21.git) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-internal-LTS-2023-03-31-1908037.daniel.daugherty.8291555forjdk21.git, mixed mode, tiered, compressed oops, compressed class ptrs, serial gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x10cfa0c] LockStack::verify_no_thread(char const*) const+0x288 Please see the bug for the latest details as I investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1492658176 From dlong at openjdk.org Fri Mar 31 22:41:56 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 31 Mar 2023 22:41:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v47] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 07:25:48 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Use int instead of size_t for cached offsets, to match the uncached offset type and avoid build failures > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234: > >> 6232: orr(hdr, hdr, markWord::unlocked_value); >> 6233: // Clear lock-bits, into t2 >> 6234: eor(t2, hdr, markWord::unlocked_value); > > In arm, I use a combination of bic and orr instead. That gives me, with just two instructions, added safety against someone handing in a "11" marked MW. I know, should never happen, but better safe. > > > ldr(new_hdr, Address(obj, oopDesc::mark_offset_in_bytes())); > bic(new_hdr, new_hdr, markWord::lock_mask_in_place); // new header (00) > orr(old_hdr, new_hdr, markWord::unlocked_value); // old header (01) > > (note that I moved MW loading down into MA::fast_lock for unrelated reasons). > > Unfortunately, on aarch64 there seem to be no bic variants that accept immediates. So it would take one more instruction to get the same result: > > > - // Load (object->mark() | 1) into hdr > - orr(hdr, hdr, markWord::unlocked_value); > - // Clear lock-bits, into t2 > - eor(t2, hdr, markWord::unlocked_value); > + // Prepare new and old header > + mov(t2, markWord::lock_mask_in_place); > + bic(t2, hdr, t2); > + orr(hdr, t2, markWord::unlocked_value); > > > But maybe there is a better way that does not need three instructions. There is a BFC (Bitfield Clear) pseudo-instruction that uses the BFM instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1154955795 From naoto at openjdk.org Fri Mar 31 22:48:29 2023 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 31 Mar 2023 22:48:29 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v5] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Fri, 17 Mar 2023 22:27:48 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Close streams when finished loading into props Hmm, I just wonder why they are sticking to ISO-8859-1 as the default. I know j.u.Properties defaults to 8859-1, but PropertyResourceBundle, which is their primary use defaults to UTF-8 since JDK9 (https://openjdk.org/jeps/226) ------------- PR Comment: https://git.openjdk.org/jdk/pull/12726#issuecomment-1492682703