From coleenp at openjdk.org Sat Apr 1 00:09:25 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 1 Apr 2023 00:09:25 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates In-Reply-To: <9BJjGpmSrwRXFXsj2EC8pewLrA1Z6PYJfn5pMbf3vnE=.9000e432-b2da-4047-b8ac-a045aceb53b8@github.com> References: <9BJjGpmSrwRXFXsj2EC8pewLrA1Z6PYJfn5pMbf3vnE=.9000e432-b2da-4047-b8ac-a045aceb53b8@github.com> Message-ID: On Fri, 31 Mar 2023 21:24:27 GMT, Vladimir Ivanov wrote: >> The SystemDictionary is updated and read under the Compile_lock but this is unnecessary because the table is a concurrent hashtable, and the lock doesn't really synchronize any other compilation state. The lock may have protected other state that Dependencies used in the past but has been removed. See discussion in CR for more information. >> Tested with tier1-8. > > src/hotspot/share/ci/ciEnv.cpp line 519: > >> 517: >> 518: Klass* found_klass; >> 519: if (!require_local) { > > Since JVMCI needs the very same code, does it make sense to put it into a wrapper method? This is a good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13270#discussion_r1155015640 From coleenp at openjdk.org Sat Apr 1 00:21:11 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 1 Apr 2023 00:21:11 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates [v2] In-Reply-To: References: Message-ID: > The SystemDictionary is updated and read under the Compile_lock but this is unnecessary because the table is a concurrent hashtable, and the lock doesn't really synchronize any other compilation state. The lock may have protected other state that Dependencies used in the past but has been removed. See discussion in CR for more information. > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add a utility function for SystemDictionary find. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13270/files - new: https://git.openjdk.org/jdk/pull/13270/files/17108a43..0d48d2a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13270&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13270&range=00-01 Stats: 50 lines in 4 files changed: 24 ins; 22 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13270/head:pull/13270 PR: https://git.openjdk.org/jdk/pull/13270 From coleenp at openjdk.org Sat Apr 1 00:21:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 1 Apr 2023 00:21:12 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates [v2] In-Reply-To: <9BJjGpmSrwRXFXsj2EC8pewLrA1Z6PYJfn5pMbf3vnE=.9000e432-b2da-4047-b8ac-a045aceb53b8@github.com> References: <9BJjGpmSrwRXFXsj2EC8pewLrA1Z6PYJfn5pMbf3vnE=.9000e432-b2da-4047-b8ac-a045aceb53b8@github.com> Message-ID: On Fri, 31 Mar 2023 21:25:17 GMT, Vladimir Ivanov wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add a utility function for SystemDictionary find. > > Looks good. Thanks for taking care of it, Coleen. > > FTR there are other redundant usages of `Compile_lock` in CI code. There isn't a guarantee that the klasses the compiler finds in the hierarchy are also in the dictionary. Changing scope of the Compile_lock breaks this guarantee. In discussions, @iwanowww thinks this race is benign as the klass returned by dependencies might not be in the dictionary (if hidden class for example). Creating an instance requires that the class be in the dictionary so we think Class.forName(this.getClass().getName()) would not be affected since "this" is an instance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13270#issuecomment-1492751187 From coleenp at openjdk.org Sat Apr 1 00:42:30 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 1 Apr 2023 00:42:30 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates [v2] In-Reply-To: <9BJjGpmSrwRXFXsj2EC8pewLrA1Z6PYJfn5pMbf3vnE=.9000e432-b2da-4047-b8ac-a045aceb53b8@github.com> References: <9BJjGpmSrwRXFXsj2EC8pewLrA1Z6PYJfn5pMbf3vnE=.9000e432-b2da-4047-b8ac-a045aceb53b8@github.com> Message-ID: On Fri, 31 Mar 2023 21:25:17 GMT, Vladimir Ivanov wrote: > FTR there are other redundant usages of Compile_lock in CI code. I filed a couple of RFEs to remove the one around implementor() and another in Universe::genesis(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13270#issuecomment-1492760407 From xliu at openjdk.org Sat Apr 1 00:47:23 2023 From: xliu at openjdk.org (Xin Liu) Date: Sat, 1 Apr 2023 00:47:23 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v5] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 30 Mar 2023 23:36:20 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges that are used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straight forward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also tested with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. src/hotspot/share/opto/escape.cpp line 639: > 637: call->add_req(selector); > 638: > 639: for (uint i = 1; i < ophi->req(); i++) { Comparing to new_phi and selector, I think this is the heavy-lifting work. You "replace" all appearances of ptn with SPSO. This logic almost overlaps the 'scalar replacement' part in MacroExpand. Do you consider to perform the transformation in MacroExpand? Your prior changes have already removed NSR marks, ME/SR will consider 'ptn'. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1155028545 From dholmes at openjdk.org Sat Apr 1 01:05:43 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 1 Apr 2023 01:05:43 GMT Subject: RFR: 8297539: Use PrimitiveConversions::cast for local uses of the int<->float union conversion trick [v4] In-Reply-To: <7NKaN1fkjZMSZIpCAP-q32n3PLpmzc8kbBAnNJ6CnR8=.65aa06fe-ead8-41c1-a52c-1b8c78fb6ded@github.com> References: <7NKaN1fkjZMSZIpCAP-q32n3PLpmzc8kbBAnNJ6CnR8=.65aa06fe-ead8-41c1-a52c-1b8c78fb6ded@github.com> Message-ID: <0Q6C2rmWozvO5FdBdpHrX6yN37EN5aBbx0foIVGPWNs=.75055d74-8077-4402-b570-b070ba9293e0@github.com> On Fri, 31 Mar 2023 09:24:52 GMT, Afshin Zafari wrote: >> **Only** the instances of using `union` for converting `int` to `float` are replaced with call to `PrimitiveConversions::cast(From)` method. Some few cases with conversion of `long` <->`double` are also replaced with `PrimitiveConversions::cast(From)`. The other instances where the union contains other types of fields than `int` and `float` are left unchanged. >> >> ### Test >> local hotspot:tier1 >> mach5 tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8297539: Use PrimitiveConversions::cast for local uses of the int<->float union conversion trick Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13136#pullrequestreview-1367783212 From duke at openjdk.org Sat Apr 1 01:05:43 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sat, 1 Apr 2023 01:05:43 GMT Subject: Integrated: 8297539: Use PrimitiveConversions::cast for local uses of the int<->float union conversion trick In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 10:59:18 GMT, Afshin Zafari wrote: > **Only** the instances of using `union` for converting `int` to `float` are replaced with call to `PrimitiveConversions::cast(From)` method. Some few cases with conversion of `long` <->`double` are also replaced with `PrimitiveConversions::cast(From)`. The other instances where the union contains other types of fields than `int` and `float` are left unchanged. > > ### Test > local hotspot:tier1 > mach5 tiers 1-5 This pull request has now been integrated. Changeset: a19b28ab Author: Afshin Zafari Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/a19b28ab3ed2d2da4eb04ce9b187dda8a75ba16a Stats: 54 lines in 5 files changed: 6 ins; 21 del; 27 mod 8297539: Use PrimitiveConversions::cast for local uses of the int<->float union conversion trick Reviewed-by: coleenp, kbarrett, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13136 From sspitsyn at openjdk.org Sat Apr 1 03:23:37 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 1 Apr 2023 03:23:37 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 09:18:51 GMT, David Holmes wrote: >> src/hotspot/share/prims/jvmtiEnvBase.hpp line 166: >> >>> 164: >>> 165: const void* get_env_local_storage() { return _env_local_storage; } >>> 166: >> >> Why was this change/move necessary? Do I miss anything? > > It is now public, not protected. I see, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1155048753 From sspitsyn at openjdk.org Sat Apr 1 03:34:34 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 1 Apr 2023 03:34:34 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v15] In-Reply-To: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> References: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> Message-ID: On Fri, 31 Mar 2023 11:18:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > fixes src/hotspot/share/prims/agent.hpp line 1: > 1: /* The name for class and file is too general. I'm thinking if renaming the files to jvmtiAgent and the class to JvmtiAgent would work. In general, there exists a convention to name JVMTI file with the "jvmti" prefix. It is a gray zone between Runtime and JVMTI but seems to belong more to JVMTI. The same about the AgentList class and file. Also, these new files are good candidates to add here: make/hotspot/lib/JvmFeatures.gmk: ifneq ($(call check-jvm-feature, jvmti), true) JVM_CFLAGS_FEATURES += -DINCLUDE_JVMTI=0 JVM_EXCLUDE_FILES += jvmtiGetLoadedClasses.cpp jvmtiThreadState.cpp jvmtiExtensions.cpp \ jvmtiImpl.cpp jvmtiManageCapabilities.cpp jvmtiRawMonitor.cpp jvmtiUtil.cpp jvmtiTrace.cpp \ jvmtiCodeBlobEvents.cpp jvmtiEnv.cpp jvmtiRedefineClasses.cpp jvmtiEnvBase.cpp jvmtiEnvThreadState.cpp \ jvmtiTagMap.cpp jvmtiEventController.cpp evmCompat.cpp jvmtiEnter.xsl jvmtiExport.cpp \ jvmtiClassFileReconstituter.cpp jvmtiTagMapTable.cpp endif ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1155049911 From stuefe at openjdk.org Sat Apr 1 06:05:28 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 1 Apr 2023 06:05:28 GMT Subject: RFR: JDK-8304815: Use NMT for more precise hs_err location printing [v9] In-Reply-To: References: Message-ID: > (This is a byproduct of work on the arm port for https://github.com/openjdk/jdk/pull/10907. I needed better debugging information in the hs-err file and in gdb.) > > Back in 2022 @zhengyu123 had the very nice idea of using NMT mapping info for smartening up pp in gdb: [JDK-8280289](https://bugs.openjdk.org/browse/JDK-8280289). > > The same idea can be applied to hs_err file location printing. NMT has information about all its mappings and can tell us where it thinks a given unknown pointer points into. > > This could be even more useful if the "find malloc block" part of that functionality would be smarter. As it is now, it only works if the pointer in question points to the start of a user-allocated area. Would be nice if the code could (carefully) search for the next valid-looking malloc header instead. > > -------------- > > This patch does this: we introduce a new API, MemTracker::print_containing_region(void*), that tries to make sense of a given unknown pointer. > > It will search its mmap regions and print those if found. For malloc'ed pointers, it will carefully sniff out the immediate surroundings of the block, trying to find what looks like a valid malloc header. It uses SafeFetch to not trip over unmapped or protected pages. Note that, of course, we may get false recognition positives if it finds something that looks like a valid header. But even that could be useful (e.g. a remnant dead header may indicate we access memory after free). > > Looks like this (its arm, so 32-bit pointers): > > > Register to memory mapping: > > -> r0 = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer > -> r1 = 0x75b02010 into life malloced block starting at 0x75b02010, size 184, tag mtObjectMonitor > -> r2 = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer > r3 = 0x0 is nullptr > -> r4 = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer > r5 = 0xb6d3bbc8: in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000 > r6 = 0xffffffff is an unknown value > r7 = 0x0 is nullptr > r8 = 0x0000000a is an unknown value > r9 = 0x728a4308 is a thread > r10 = 0x0000000b is an unknown value > -> fp = 0x753fe8cc in mmap'd memory region [0x75380000 - 0x75400000] by Thread Stack > r12 = 0x0 is nullptr > -> sp = 0x753fe8b0 in mmap'd memory region [0x75380000 - 0x75400000] by Thread Stack > lr = 0xb69bc1d0: in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000 > pc = 0xb69c8670: in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000 > > > > The small caveat here is that NMT reporting needs ThreadCritical, and thus using it for location printing may block error reporting if it crashed inside a ThreadCritical section. We face the same issue today when printing the NMT report as part of the hs-err file. > > I think the usefulness of these printouts justify this risk. However I opened https://bugs.openjdk.org/browse/JDK-8304824 to investigate a better locking strategy for NMT. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13162/files - new: https://git.openjdk.org/jdk/pull/13162/files/dcc1e631..4f2d37c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13162&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13162&range=07-08 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13162.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13162/head:pull/13162 PR: https://git.openjdk.org/jdk/pull/13162 From stuefe at openjdk.org Sat Apr 1 06:44:22 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 1 Apr 2023 06:44:22 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v4] In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 21:41:12 GMT, Mikael Vidstedt wrote: > FYI: We're currently in the process of investigating adding support for doing almost the exact opposite of this in JDK-8303215. In particular, we're looking to effectively "undo" the effects of huge page(s) temporarily having been used for the stack. There's potentially room for sharing some of the implementation aspects. Interesting issue, but I don't think there is much overlap. Well, it depends on how that issue is solved. I'd like to get this issue out f the door though, the implementation is simple enough and can easily be changed later if we want to reuse code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10403#issuecomment-1492852809 From stuefe at openjdk.org Sat Apr 1 06:57:34 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 1 Apr 2023 06:57:34 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v5] In-Reply-To: References: Message-ID: > When doing performance- and footprint analysis, `AlwaysPreTouch` option is very handy for reducing noise. It would be good to have a similar option for pre-touching thread stacks. In addition to reducing noise, it can serve as worst-case test for thread costs, as well as a test for NMT regressions. > > Patch adds a new diagnostic switch, `AlwaysPreTouchStacks`, as a companion switch to `AlwaysPreTouch`. Touching is super-simple using `alloca()`. Also, regression test. > > Examples: > > NMT, thread stacks, 10000 Threads, default: > > > - Thread (reserved=10332400KB, committed=331828KB) > (thread #10021) > (stack: reserved=10301560KB, committed=300988KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) > > > NMT, thread stacks, 10000 Threads, +AlwaysPreTouchStacks: > > > - Thread (reserved=10332400KB, committed=10284360KB) > (thread #10021) > (stack: reserved=10301560KB, committed=10253520KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge - test changes, comment change - AlwaysPreTouchStacks ------------- Changes: https://git.openjdk.org/jdk/pull/10403/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10403&range=04 Stats: 153 lines in 4 files changed: 153 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10403.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10403/head:pull/10403 PR: https://git.openjdk.org/jdk/pull/10403 From qamai at openjdk.org Sat Apr 1 07:44:25 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 1 Apr 2023 07:44:25 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v4] In-Reply-To: References: Message-ID: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - instruction asserts - Merge branch 'master' into sliceIntrinsics - add comments explaining anonymous classes - address reviews - sse2, increase warmup - aesthetic - optimise 64B - add jmh - vector slice intrinsics ------------- Changes: https://git.openjdk.org/jdk/pull/12909/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=03 Stats: 1603 lines in 58 files changed: 1277 ins; 257 del; 69 mod Patch: https://git.openjdk.org/jdk/pull/12909.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12909/head:pull/12909 PR: https://git.openjdk.org/jdk/pull/12909 From rkennke at openjdk.org Sat Apr 1 08:00:51 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sat, 1 Apr 2023 08:00:51 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v50] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix assert in lock-stack boundaries check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/1ad95851..298031b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=48-49 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Sat Apr 1 08:00:51 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sat, 1 Apr 2023 08:00:51 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v49] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 21:59:19 GMT, Daniel D. Daugherty wrote: > v47 is hitting an assertion failure in my Mach5 Tier2 and Tier3 testing: > > ``` > # Internal Error (/opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S30407/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b94a7623-5f98-46f3-8e2c-08444e95afa4/runs/a5754c45-3d7a-46fa-ba4b-c52efcf6ca3b/workspace/open/src/hotspot/share/runtime/lockStack.cpp:78), pid=1731612, tid=1731617 > # assert((_top < end_offset())) failed: lockstack overflow: _top 1704 end_offset 1704 > # > # JRE version: Java(TM) SE Runtime Environment (21.0) (fastdebug build 21-internal-LTS-2023-03-31-1908037.daniel.daugherty.8291555forjdk21.git) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-internal-LTS-2023-03-31-1908037.daniel.daugherty.8291555forjdk21.git, mixed mode, tiered, compressed oops, compressed class ptrs, serial gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0x10cfa0c] LockStack::verify_no_thread(char const*) const+0x288 > ``` > > Please see the bug for the latest details as I investigate. Uh, that is a simple mistake. It should assert _top <= end_offset(). _top is allowed to be at _end, but not go beyond that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1492868319 From duke at openjdk.org Sat Apr 1 11:50:39 2023 From: duke at openjdk.org (duke) Date: Sat, 1 Apr 2023 11:50:39 GMT Subject: Withdrawn: JDK-8296995: ostream should handle snprintf(3) errors in release builds In-Reply-To: References: Message-ID: On Tue, 15 Nov 2022 11:07:37 GMT, Thomas Stuefe wrote: > Small fix for a very unlikely problem. > > All streams in ostream.hpp end up using `os::snprintf()`, which uses `::vsnprintf()`. `vsnprintf(3)`can fail and return -1. > > The chance for this to happen is small. snprintf errors are usually encoding errors though not always (see third example at https://stackoverflow.com/questions/65334245/what-is-an-encoding-error-for-sprintf-that-should-return-1). I found "%ls" in one place in windows coding, so I am not sure we can always exclude the possibility of wide strings being used in our code base, or that of printing with outside-provided format strings. > > In case of an error, we assert in debug builds but don't handle it in release. There, this situation gets misdiagnosed later as a buffer overflow because we cast the signedness of the result away (see `outputStream::do_vsnprintf()`). > > --- > > The patch is trivial. The most exciting thing is the gtest, I guess. > > In release builds, we now treat this condition as an empty string write. I considered printing a clear marker into the stream instead, e.g. "ENCODING ERROR", but ultimately did not do it. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/11160 From coleenp at openjdk.org Sat Apr 1 13:27:17 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 1 Apr 2023 13:27:17 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates [v3] In-Reply-To: References: Message-ID: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> > The SystemDictionary is updated and read under the Compile_lock but this is unnecessary because the table is a concurrent hashtable, and the lock doesn't really synchronize any other compilation state. The lock may have protected other state that Dependencies used in the past but has been removed. See discussion in CR for more information. > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Missing null check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13270/files - new: https://git.openjdk.org/jdk/pull/13270/files/0d48d2a8..a53ab524 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13270&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13270&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13270/head:pull/13270 PR: https://git.openjdk.org/jdk/pull/13270 From jwaters at openjdk.org Sat Apr 1 16:50:32 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 1 Apr 2023 16:50:32 GMT Subject: RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting [v3] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 10:25:06 GMT, Kim Barrett wrote: >> Please review this change to the implementation of the Windows-specific option >> UseOSErrorReporting, toward allowing crash reporting functions to be declared >> noreturn. VMError::report_and_die no longer conditionally returns if the >> Windows-only option UseOSErrorReporting is true. >> >> The Windows-only sections of report_and_die now call RaiseFailFastException >> (https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception), >> which immediately invokes WER (Windows Error Reporting) if it is enabled, >> without executing structured exception handler. If WER is not enabled, it >> just immediately terminates the program. Thus, we no longer return to walk up >> thestructured exception handler chain to pop out at the top as unhandled in >> order to invoke WER. >> >> This permits declaring report_and_die as [[noreturn]], once some functions >> from the os class are also so declared. Also adding that attribute as >> appropriate to other functions in the os class. This of course assumes >> the use of [[noreturn]] in HotSpot code is approved (JDK-8302124). >> >> There is a pre-existing bug that I'll be reporting separately. If >> UseOSErrorReporting and CreateCoredumpOnCrash are both true, we create an >> empty .mdmp file. We shouldn't create that file when UseOSErrorReporting. >> >> Testing: >> mach5 tier1-3 >> >> Manual testing with the following, to verify desired behavior. >> >> -XX:ErrorHandlerTest=N >> 1: assertion failure >> 2: guarantee failure >> 14: SIGSEGV >> 15: divide by zero >> path/to/bin/java \ >> -XX:+UnlockDiagnosticVMOptions \ >> -XX:+ErrorLogSecondaryErrorDetails \ >> -XX:+UseOSErrorReporting \ >> -XX:ErrorHandlerTest=1 \ >> TestDebug.java >> >> --- TestDebug.java --- >> import java.lang.String; >> public class TestDebug { >> static private volatile String dummy; >> public static void main(String[] args) throws Exception { >> while (true) { >> dummy = new String("foo bar"); >> } >> } >> } >> --- end TestDebug.java --- >> >> The state of WER can be examined and modified using Power Shell commands >> {Get,Enable,Disable}-WindowsErrorReporting. >> >> The state of reporting WER captured errors can be examined and modified using >> Control Panel > Security and Maintenance > Maintenance : Report Problems [on,off] >> >> With Report Problems off, reports are placed in >> c:\ProgramData\Microsoft\Windows\WER\ReportArchive >> >> I verified that executing the above test with WER enabled adds an entry in >> that directory, but not when it's disabled. Also nothing is added there when >> the test is run with -XX:-UseOSErrorReporting. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into failfast > - remove failfast cuttoff of secondary errors > - failfast After some digging it looks like RaiseFailFastException isn't noreturn, which is rather strange ------------- PR Comment: https://git.openjdk.org/jdk/pull/12759#issuecomment-1493042334 From darcy at openjdk.org Sat Apr 1 18:15:16 2023 From: darcy at openjdk.org (Joe Darcy) Date: Sat, 1 Apr 2023 18:15:16 GMT Subject: RFR: JDK-8303798: REDO - Remove fdlibm C sources Message-ID: This PR is a redo of JDK-8302801: Remove fdlibm C sources. The problem with JDK-8302801 was that it neglected (mea culpa) to include a Java implementation of IEEEremainder before the FDLIBM C implementation was deleted. Such an implementation has been successfully provided under JDK-8304028: Port fdlibm IEEEremainder to Java. After JDK-8304028, there are no native methods left in StrictMath. This PR is the same as JDK-8302801 other than StrictMath.c already being removed under JDK-8304028. ------------- Commit messages: - JDK-8303798: REDO - Remove fdlibm C sources Changes: https://git.openjdk.org/jdk/pull/13279/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13279&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303798 Stats: 6516 lines in 64 files changed: 20 ins; 6486 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13279/head:pull/13279 PR: https://git.openjdk.org/jdk/pull/13279 From stuefe at openjdk.org Sun Apr 2 06:22:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 2 Apr 2023 06:22:43 GMT Subject: Integrated: JDK-8304815: Use NMT for more precise hs_err location printing In-Reply-To: References: Message-ID: <-TcctwEaeb8RloyXyG3WIjDExAIqToRGSt1y5EyrBLI=.96cb873f-739d-465e-bd38-b95e5c2f0b97@github.com> On Thu, 23 Mar 2023 16:51:52 GMT, Thomas Stuefe wrote: > (This is a byproduct of work on the arm port for https://github.com/openjdk/jdk/pull/10907. I needed better debugging information in the hs-err file and in gdb.) > > Back in 2022 @zhengyu123 had the very nice idea of using NMT mapping info for smartening up pp in gdb: [JDK-8280289](https://bugs.openjdk.org/browse/JDK-8280289). > > The same idea can be applied to hs_err file location printing. NMT has information about all its mappings and can tell us where it thinks a given unknown pointer points into. > > This could be even more useful if the "find malloc block" part of that functionality would be smarter. As it is now, it only works if the pointer in question points to the start of a user-allocated area. Would be nice if the code could (carefully) search for the next valid-looking malloc header instead. > > -------------- > > This patch does this: we introduce a new API, MemTracker::print_containing_region(void*), that tries to make sense of a given unknown pointer. > > It will search its mmap regions and print those if found. For malloc'ed pointers, it will carefully sniff out the immediate surroundings of the block, trying to find what looks like a valid malloc header. It uses SafeFetch to not trip over unmapped or protected pages. Note that, of course, we may get false recognition positives if it finds something that looks like a valid header. But even that could be useful (e.g. a remnant dead header may indicate we access memory after free). > > Looks like this (its arm, so 32-bit pointers): > > > Register to memory mapping: > > -> r0 = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer > -> r1 = 0x75b02010 into life malloced block starting at 0x75b02010, size 184, tag mtObjectMonitor > -> r2 = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer > r3 = 0x0 is nullptr > -> r4 = 0x728a6ae0 into life malloced block starting at 0x728a6ae0, size 104, tag mtSynchronizer > r5 = 0xb6d3bbc8: in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000 > r6 = 0xffffffff is an unknown value > r7 = 0x0 is nullptr > r8 = 0x0000000a is an unknown value > r9 = 0x728a4308 is a thread > r10 = 0x0000000b is an unknown value > -> fp = 0x753fe8cc in mmap'd memory region [0x75380000 - 0x75400000] by Thread Stack > r12 = 0x0 is nullptr > -> sp = 0x753fe8b0 in mmap'd memory region [0x75380000 - 0x75400000] by Thread Stack > lr = 0xb69bc1d0: in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000 > pc = 0xb69c8670: in /shared/projects/openjdk/jdk-jdk/output-fastdebug-arm/images/jdk/lib/server/libjvm.so at 0xb5445000 > > > > The small caveat here is that NMT reporting needs ThreadCritical, and thus using it for location printing may block error reporting if it crashed inside a ThreadCritical section. We face the same issue today when printing the NMT report as part of the hs-err file. > > I think the usefulness of these printouts justify this risk. However I opened https://bugs.openjdk.org/browse/JDK-8304824 to investigate a better locking strategy for NMT. This pull request has now been integrated. Changeset: 41a3db26 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/41a3db267d0cae9d53117768406b1b7ef1611c91 Stats: 268 lines in 13 files changed: 223 ins; 8 del; 37 mod 8304815: Use NMT for more precise hs_err location printing Reviewed-by: jsjolen, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/13162 From stuefe at openjdk.org Sun Apr 2 06:24:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 2 Apr 2023 06:24:40 GMT Subject: Integrated: JDK-8294266: Add a way to pre-touch java thread stacks In-Reply-To: References: Message-ID: On Fri, 23 Sep 2022 09:39:41 GMT, Thomas Stuefe wrote: > When doing performance- and footprint analysis, `AlwaysPreTouch` option is very handy for reducing noise. It would be good to have a similar option for pre-touching thread stacks. In addition to reducing noise, it can serve as worst-case test for thread costs, as well as a test for NMT regressions. > > Patch adds a new diagnostic switch, `AlwaysPreTouchStacks`, as a companion switch to `AlwaysPreTouch`. Touching is super-simple using `alloca()`. Also, regression test. > > Examples: > > NMT, thread stacks, 10000 Threads, default: > > > - Thread (reserved=10332400KB, committed=331828KB) > (thread #10021) > (stack: reserved=10301560KB, committed=300988KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) > > > NMT, thread stacks, 10000 Threads, +AlwaysPreTouchStacks: > > > - Thread (reserved=10332400KB, committed=10284360KB) > (thread #10021) > (stack: reserved=10301560KB, committed=10253520KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) This pull request has now been integrated. Changeset: b8c748db Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/b8c748dbe468582b9f2a73b17da47148e64cd186 Stats: 153 lines in 4 files changed: 153 ins; 0 del; 0 mod 8294266: Add a way to pre-touch java thread stacks Reviewed-by: rehn, gziemski ------------- PR: https://git.openjdk.org/jdk/pull/10403 From jwaters at openjdk.org Sun Apr 2 06:43:24 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 2 Apr 2023 06:43:24 GMT Subject: RFR: 8250269: Replace ATTRIBUTE_ALIGNED with alignas [v15] In-Reply-To: References: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> Message-ID: On Sat, 4 Feb 2023 15:05:06 GMT, Julian Waters wrote: >> C++11 added the alignas attribute, for the purpose of specifying alignment on types, much like compiler specific syntax such as gcc's __attribute__((aligned(x))) or Visual C++'s __declspec(align(x)). >> >> We can phase out the use of the macro in favor of the standard attribute. In the meantime, we can replace the compiler specific definitions of ATTRIBUTE_ALIGNED with a portable definition. We might deprecate the use of the macro but changing its implementation quickly and cleanly applies the feature where the macro is being used. >> >> Note: With certain parts of HotSpot using ATTRIBUTE_ALIGNED so indiscriminately, this commit will likely take some time to get right >> >> This will require adding the alignas attribute to the list of language features approved for use in HotSpot code. (Completed with [8297912](https://github.com/openjdk/jdk/pull/11446)) > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'openjdk:master' into alignas > - alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - ... and 5 more: https://git.openjdk.org/jdk/compare/88837f09...a621bb62 :( ------------- PR Comment: https://git.openjdk.org/jdk/pull/11431#issuecomment-1493245977 From alanb at openjdk.org Sun Apr 2 07:27:18 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 2 Apr 2023 07:27:18 GMT Subject: RFR: JDK-8303798: REDO - Remove fdlibm C sources In-Reply-To: References: Message-ID: On Sat, 1 Apr 2023 18:08:44 GMT, Joe Darcy wrote: > This PR is a redo of JDK-8302801: Remove fdlibm C sources. The problem with JDK-8302801 was that it neglected (mea culpa) to include a Java implementation of IEEEremainder before the FDLIBM C implementation was deleted. Such an implementation has been successfully provided under JDK-8304028: Port fdlibm IEEEremainder to Java. After JDK-8304028, there are no native methods left in StrictMath. > > This PR is the same as JDK-8302801 other than StrictMath.c already being removed under JDK-8304028. I assume at least tier1-4 has been run, in which case this looks good (same as previous PR). ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13279#pullrequestreview-1368033821 From tanksherman27 at gmail.com Sun Apr 2 14:26:50 2023 From: tanksherman27 at gmail.com (Julian Waters) Date: Sun, 2 Apr 2023 22:26:50 +0800 Subject: ALLOW_C_FUNCTION's first parameter doesn't seem very helpful Message-ID: Hi everyone, I couldn't help but notice that the method name passed to the macro to allow use of itself isn't really needed in any way. Comments in the macro definition say it's to show exactly what method is being permitted by the macro, but to me it seems the actual call to it that happens in the second half of the macro already does that job entirely // It's already obvious from the actual method call itself that we want std::_Exit ALLOW_C_FUNCTION(::_Exit, ::_Exit(code);) Is there a reason we still keep the first parameter around other than for documentation purposes? best regards, Julian -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Sun Apr 2 15:43:26 2023 From: duke at openjdk.org (ExE Boss) Date: Sun, 2 Apr 2023 15:43:26 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v16] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 11:28:25 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > Per Minborg has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/java.base/share/classes/java/lang/foreign/MemorySegment.java > > Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> > - Update src/java.base/share/classes/java/lang/foreign/MemorySegment.java > > Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> src/java.base/share/classes/java/lang/foreign/Linker.java line 638: > 636: * .map(MemoryLayout::name) > 637: * .filter(Optional::isPresent) > 638: * .map(Optional::get) [`Optional::stream()`] was?added specifically for?this: Suggestion: * .flatMap(Optional::stream) [`Optional::stream()`]: https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/Optional.html#stream() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1155333484 From darcy at openjdk.org Sun Apr 2 20:53:28 2023 From: darcy at openjdk.org (Joe Darcy) Date: Sun, 2 Apr 2023 20:53:28 GMT Subject: RFR: JDK-8303798: REDO - Remove fdlibm C sources In-Reply-To: References: Message-ID: <2Wk9KXrusm7rZrJ_f4J8hkUsFw6XmojEJn3U_J9SOzM=.d4e01c52-a73e-4b4c-a31a-042208b05ee8@github.com> On Sun, 2 Apr 2023 07:24:24 GMT, Alan Bateman wrote: > I assume at least tier1-4 has been run, in which case this looks good (same as previous PR). Right; tier 1 - 4 job was successful other than an unrelated time-out. Previously, with the initial removal attempt there were many failures in tiers 2 and 3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13279#issuecomment-1493437003 From rkennke at openjdk.org Sun Apr 2 21:41:47 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sun, 2 Apr 2023 21:41:47 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v51] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with four additional commits since the last revision: - Replace UseFastLocking with LockingMode flag - Reject +UseFastLocking on unsupported platforms - Merge remote-tracking branch 'tstuefe/ARM-port-8291555' into JDK-8291555-v2 - Arm port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/298031b6..1348f3bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=50 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=49-50 Stats: 650 lines in 39 files changed: 328 ins; 35 del; 287 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From dholmes at openjdk.org Sun Apr 2 23:11:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 2 Apr 2023 23:11:17 GMT Subject: RFR: 8304033: JFR: Missing thread In-Reply-To: References: Message-ID: <8VgnMZiSIQQfSwiEg9Pi3ZrJ3qIUDdXDrxOibqKrsME=.0398a1ed-6b7a-45a5-8ccb-af89b8893ddd@github.com> On Fri, 31 Mar 2023 16:42:55 GMT, Markus Gr?nlund wrote: > Greetings, > > please help review this small adjustment to fix the lack of thread information in certain situations, more specifically associated with JNI_AttachThread and JNI_DetachThread. The old site posting a thread start event is correct in getting the correct thread id, but the thread does not write its checkpoint at that location, which is required after JFR Event Streaming. > > The fix is to let the sites in jni.cpp go through the "normal" thread start entry point. > > Testing: jdk_jfr > > Thanks > Markus Seems reasonable ... though does beg the question why it wasn't done this way from the start? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13272#pullrequestreview-1368155282 From dholmes at openjdk.org Sun Apr 2 23:20:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 2 Apr 2023 23:20:18 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates [v3] In-Reply-To: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> References: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> Message-ID: On Sat, 1 Apr 2023 13:27:17 GMT, Coleen Phillimore wrote: >> The SystemDictionary is updated and read under the Compile_lock but this is unnecessary because the table is a concurrent hashtable, and the lock doesn't really synchronize any other compilation state. The lock may have protected other state that Dependencies used in the past but has been removed. See discussion in CR for more information. >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Missing null check Seems fine based on the analysis that has been done. Hope no one tries to backport this without also doing thorough analysis! Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13270#pullrequestreview-1368156833 From fyang at openjdk.org Mon Apr 3 00:45:18 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 3 Apr 2023 00:45:18 GMT Subject: RFR: 8305247: On RISC-V generate_fixed_frame() sometimes generate a relativized locals value which is way too large. In-Reply-To: References: Message-ID: <33fOKYnyNlw2qToibd1Jb8SnKQlGEQUUgjZwEcYjqaM=.e8279851-7bbf-4ac6-84e8-c9554d46df91@github.com> On Thu, 30 Mar 2023 12:57:23 GMT, Fredrik Bredberg wrote: > The relativized locals value is supposed to contain the distance between the frame pointer and the local variables in an interpreter frame, expressed in number of words. It typically contains the value "frame::sender_sp_offset + padding + max_locals - 1" > > On most architectures sender_sp_offset is 2. This gives us the value "1 + padding + max_locals", which is always greater or equal to 1. > > However on RISC-V the value of frame::sender_sp_offset is 0, which means that if we don't have any padding and no local variables we end up with a relativized_locals value of -1. > > When generate_fixed_frame() calculates the relativized_locals value it subtracts the frame pointer from the xlocals and then logically shifts the result right by Interpreter::logStackElementSize (to convert it into a word index). > > This works fine on all platforms (except RISC-V), because the subtraction will never become negative. But since the subtraction can end up negative on RISC-V, the shift instruction must be a arithmetic-shift-right (not a logical-shift-right) to preserve the sign and not end up with a very large positive index. > > This is currently not a real problem since the relativized_locals value is not used if max_local is zero, which is the only case the value is wrong. > > It is however a real problem when implementing JDK-8300197. > > The bug was introduced in JDK-8299795 and is fixed by changing a "srli" instruction to a "srai" in generate_fixed_frame(). Looks reasonable. I performed tier1-3 tests on my linux-riscv64 boards, result looks good. BTW: You might want to change the issue title removing the '.' symbol. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13245#pullrequestreview-1368173997 From xgong at openjdk.org Mon Apr 3 01:57:25 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Apr 2023 01:57:25 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 12:25:16 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > small cosmetics Looks good to me! Thanks! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/13093#pullrequestreview-1368199580 From stuefe at openjdk.org Mon Apr 3 06:00:47 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 3 Apr 2023 06:00:47 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v16] In-Reply-To: References: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> Message-ID: <2QX2qfq-D7XFxp-deWNW7JyrZRMikCDt9LLe8EyQKx8=.1dc3d0be-b6f4-4651-a6d5-1b4aced23ba9@github.com> On Fri, 31 Mar 2023 15:34:12 GMT, Matias Saavedra Silva wrote: > > This obviously breaks arm, since its implementation is missing. I opened https://bugs.openjdk.org/browse/JDK-8305387 to track this. This is unfortunate since it holds work on arm in other areas, in my case for #10907. > > > This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x > > > > > > I wonder about the explicit exclusion of arm. Every other CPU seems to be taken care of, even those Oracle does not maintain. Just curious, was there a special reason for excluding arm? > > There is no special reason ARM32 was excluded other than the fact no porter has picked it up yet. Fortunately I was able to get in contact with porters for the other platforms, but nobody took on the ARM port until now. Thank you for opening the issue! I lack the time to do this atm; let's see if one of the porters can help. @bulasevich @snazarkin ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12778#issuecomment-1493703283 From stuefe at openjdk.org Mon Apr 3 06:13:58 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 3 Apr 2023 06:13:58 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v16] In-Reply-To: <2QX2qfq-D7XFxp-deWNW7JyrZRMikCDt9LLe8EyQKx8=.1dc3d0be-b6f4-4651-a6d5-1b4aced23ba9@github.com> References: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> <2QX2qfq-D7XFxp-deWNW7JyrZRMikCDt9LLe8EyQKx8=.1dc3d0be-b6f4-4651-a6d5-1b4aced23ba9@github.com> Message-ID: On Mon, 3 Apr 2023 05:57:18 GMT, Thomas Stuefe wrote: > > I wonder about the explicit exclusion of arm. Every other CPU seems to be taken care of, even those Oracle does not maintain. Just curious, was there a special reason for excluding arm? > > There is no special reason ARM32 was excluded other than the fact no porter has picked it up yet. Fortunately I was able to get in contact with porters for the other platforms, but nobody took on the ARM port until now. Thank you for opening the issue! For future reference, we maintain a list of people working on ports: https://wiki.openjdk.org/display/HotSpot/Ports and we do have a mailing list for porters as well: porters-dev at openjdk.org . This makes it easier to find out who to contact. Cheers, Thomas ------------- PR Comment: https://git.openjdk.org/jdk/pull/12778#issuecomment-1493717962 From iris at openjdk.org Mon Apr 3 06:38:18 2023 From: iris at openjdk.org (Iris Clark) Date: Mon, 3 Apr 2023 06:38:18 GMT Subject: RFR: JDK-8303798: REDO - Remove fdlibm C sources In-Reply-To: References: Message-ID: On Sat, 1 Apr 2023 18:08:44 GMT, Joe Darcy wrote: > This PR is a redo of JDK-8302801: Remove fdlibm C sources. The problem with JDK-8302801 was that it neglected (mea culpa) to include a Java implementation of IEEEremainder before the FDLIBM C implementation was deleted. Such an implementation has been successfully provided under JDK-8304028: Port fdlibm IEEEremainder to Java. After JDK-8304028, there are no native methods left in StrictMath. > > This PR is the same as JDK-8302801 other than StrictMath.c already being removed under JDK-8304028. Marked as reviewed by iris (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13279#pullrequestreview-1368368000 From jwaters at openjdk.org Mon Apr 3 06:45:23 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 3 Apr 2023 06:45:23 GMT Subject: RFR: JDK-8303798: REDO - Remove fdlibm C sources In-Reply-To: References: Message-ID: On Sat, 1 Apr 2023 18:08:44 GMT, Joe Darcy wrote: > This PR is a redo of JDK-8302801: Remove fdlibm C sources. The problem with JDK-8302801 was that it neglected (mea culpa) to include a Java implementation of IEEEremainder before the FDLIBM C implementation was deleted. Such an implementation has been successfully provided under JDK-8304028: Port fdlibm IEEEremainder to Java. After JDK-8304028, there are no native methods left in StrictMath. > > This PR is the same as JDK-8302801 other than StrictMath.c already being removed under JDK-8304028. Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13279#pullrequestreview-1368374542 From wojciech.kudla at hsbc.com Mon Apr 3 07:35:12 2023 From: wojciech.kudla at hsbc.com (Wojciech KUDLA) Date: Mon, 3 Apr 2023 07:35:12 +0000 Subject: [PATCH] Added support for fractional values of SafepointTimeoutDelay Message-ID: Hi everyone, My organization uses Java to run latency-sensitive workloads; we often engage in performance troubleshooting of the compiled code or the VM itself. We typically avoid STW altogether but when a safepoint happens it's always a sub-millisecond pause. The SafepointTimeoutDelay is great for quickly identifying threads that failed to park themselves in a timely manner and are causing long time-to-safepoint issues but the functionality works with millisecond granularity so doesn't cover our use cases at all. We have a very small patch that introduces sub-millisecond granularity that we'd like to share with the community. We have just signed the OCA and this would be our first contribution so we'd appreciate if someone could sponsor this. The change is very straightforward but to preserve backwards compatibility we decided to replace the type of SafepointTimeoutDelay with double which I imagine might the topic of various opinions clashing. We're very happy to engage in these discussions and shape the patch to the community's standards. Here it is in its entirety: --- src/hotspot/share/runtime/globals.hpp | 7 ++++--- src/hotspot/share/runtime/safepoint.cpp | 4 ++-- src/hotspot/share/utilities/globalDefinitions.hpp | 3 +++ 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp index 483287e6262..d694758829b 100644 --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -1289,9 +1289,10 @@ const int ObjectAlignmentInBytes = 8; "(0 means none)") \ range(0, max_jint) \ \ - product(intx, SafepointTimeoutDelay, 10000, \ - "Delay in milliseconds for option SafepointTimeout") \ - range(0, max_intx LP64_ONLY(/MICROUNITS)) \ + product(double, SafepointTimeoutDelay, 10000, \ + "Delay in milliseconds for option SafepointTimeout; " \ + "supports sub-millisecond resolution with fractional values.") \ + range(0, max_jlongDouble LP64_ONLY(/MICROUNITS)) \ \ product(bool, UseSystemMemoryBarrier, false, EXPERIMENTAL, \ "Try to enable system memory barrier") \ diff --git a/src/hotspot/share/runtime/safepoint.cpp b/src/hotspot/share/runtime/safepoint.cpp index 2ff593a0143..7a27aaf804c 100644 --- a/src/hotspot/share/runtime/safepoint.cpp +++ b/src/hotspot/share/runtime/safepoint.cpp @@ -379,7 +379,7 @@ void SafepointSynchronize::begin() { if (SafepointTimeout) { // Set the limit time, so that it can be compared to see if this has taken // too long to complete. - safepoint_limit_time = SafepointTracing::start_of_safepoint() + (jlong)SafepointTimeoutDelay * (NANOUNITS / MILLIUNITS); + safepoint_limit_time = SafepointTracing::start_of_safepoint() + + (jlong)SafepointTimeoutDelay * NANOSECS_PER_MILLISEC; timeout_error_printed = false; } @@ -795,7 +795,7 @@ void SafepointSynchronize::print_safepoint_timeout() { os::naked_sleep(3000); } } - fatal("Safepoint sync time longer than " INTX_FORMAT "ms detected when executing %s.", + fatal("Safepoint sync time longer than " JDOUBLE_FORMAT_P(6) "ms + detected when executing %s.", SafepointTimeoutDelay, VMThread::vm_operation()->name()); } } diff --git a/src/hotspot/share/utilities/globalDefinitions.hpp b/src/hotspot/share/utilities/globalDefinitions.hpp index 41ff5150243..1570fde7477 100644 --- a/src/hotspot/share/utilities/globalDefinitions.hpp +++ b/src/hotspot/share/utilities/globalDefinitions.hpp @@ -151,6 +151,9 @@ class oopDesc; #define UINTX_FORMAT_X "0x%" PRIxPTR #define UINTX_FORMAT_W(width) "%" #width PRIuPTR +// Format jdouble with defined precision #define +JDOUBLE_FORMAT_P(precision) "%." #precision "f" + // Format jlong, if necessary #ifndef JLONG_FORMAT #define JLONG_FORMAT INT64_FORMAT -- Kind regards Wojciech Kudla HSBC Bank plc PUBLIC ----------------------------------------- SAVE PAPER - THINK BEFORE YOU PRINT! This E-mail is confidential. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail. Internet communications cannot be guaranteed to be timely secure, error or virus-free. The sender does not accept liability for any errors or omissions. From wojciech.kudla at hsbc.com Mon Apr 3 07:56:50 2023 From: wojciech.kudla at hsbc.com (Wojciech KUDLA) Date: Mon, 3 Apr 2023 07:56:50 +0000 Subject: [PATCH] Added support for grace period before AbortVMOnSafepointTimeout triggers Message-ID: Hi everyone, Our bank uses Java for low-latency applications and we sometimes need to dig into a long time-to-safepoint pause in greater detail. AbortVMOnSafepointTimeout is extremely useful but we want to avoid putting the JVM in debug while it is still warming up or bootstrapping an application. For that reason we introduced a grace period that avoids triggering the AbortVMOnSafepointTimeout functionality until some preconfigured time after the JVM startup. The patch is extremely simple and tangential to the SafepointTimeoutDelay patch we submitted this morning: https://mail.openjdk.org/pipermail/hotspot-dev/2023-April/072455.html As stated in that thread - we recently signed the OCA and this is the second of two patches we'd like to contribute; I think it will require someone to sponsor it. We're happy to continue discussing this patch here to ensure the change meets the community's requirements. --- src/hotspot/share/runtime/globals.hpp | 5 +++++ src/hotspot/share/runtime/safepoint.cpp | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp index 6006063421e..05ff1228050 100644 --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -426,6 +426,11 @@ const int ObjectAlignmentInBytes = 8; product(bool, AbortVMOnSafepointTimeout, false, DIAGNOSTIC, \ "Abort upon failure to reach safepoint (see SafepointTimeout)") \ \ + product(uint64_t, AbortVMOnSafepointTimeoutDelay, 0, DIAGNOSTIC, \ + "Enable option AbortVMOnSafepointTimeout after this many " \ + "milliseconds since JVM startup") \ + range(0, max_jlong) \ + \ product(bool, AbortVMOnVMOperationTimeout, false, DIAGNOSTIC, \ "Abort upon failure to complete VM operation promptly") \ \ diff --git a/src/hotspot/share/runtime/safepoint.cpp b/src/hotspot/share/runtime/safepoint.cpp index 2ff593a0143..42d41f74d5d 100644 --- a/src/hotspot/share/runtime/safepoint.cpp +++ b/src/hotspot/share/runtime/safepoint.cpp @@ -67,6 +67,7 @@ #include "runtime/threadSMR.hpp" #include "runtime/threadWXSetters.inline.hpp" #include "runtime/timerTrace.hpp" +#include "services/management.hpp" #include "services/runtimeService.hpp" #include "utilities/events.hpp" #include "utilities/macros.hpp" @@ -784,7 +785,7 @@ void SafepointSynchronize::print_safepoint_timeout() { // To debug the long safepoint, specify both AbortVMOnSafepointTimeout & // ShowMessageBoxOnError. - if (AbortVMOnSafepointTimeout) { + if (AbortVMOnSafepointTimeout && Management::ticks_to_ms(os::elapsed_counter()) > (jlong)AbortVMOnSafepointTimeoutDelay) { // Send the blocking thread a signal to terminate and write an error file. for (JavaThreadIteratorWithHandle jtiwh; JavaThread *cur_thread = jtiwh.next(); ) { if (cur_thread->safepoint_state()->is_running()) { -- Thanks Wojciech Kudla HSBC Bank plc PUBLIC ----------------------------------------- SAVE PAPER - THINK BEFORE YOU PRINT! This E-mail is confidential. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail. Internet communications cannot be guaranteed to be timely secure, error or virus-free. The sender does not accept liability for any errors or omissions. From sgehwolf at redhat.com Mon Apr 3 08:28:27 2023 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Mon, 03 Apr 2023 10:28:27 +0200 Subject: test failures on the Linux ARM Thumb platforms due to os::current_frame() implementation In-Reply-To: References: Message-ID: <515a9370a9b900abcfc7d38d72c3bd888bac3568.camel@redhat.com> Hi, On Mon, 2023-04-03 at 10:41 +1200, Vladimir Petko wrote: > Hi, > > ?os::current_frame() is stubbed out on the Linux Arm Thumb platform [1] > This causes gtest 'is_first_C_frame' to fail. The attached > 'disable-thumb-assertion.patch' patch disables assertion for the ARM > Thumb platform. > > This also affects > 'MemDetailReporter::report_virtual_memory_region()'[2] causing the > trace to miss 'from ' part which breaks > 'test/hotspot/jtreg/runtime/NMT/VirtualAllocCommitMerge.java'. > The attached patch 'update-assertion-for-armhf.patch' relaxes the > assertion allowing the test to pass. This belongs to hotspot-dev (cc). Bcc jdk-dev. Could you perhaps create a PR at?https://github.com/openjdk/jdk ? Thanks, Severin > Best Regards, > ?Vladimir. > > [1] https://github.com/openjdk/jdk/blob/0deb648985b018653ccdaf193dc13b3cf21c088a/src/hotspot/os_cpu/linux_arm/os_linux_arm.cpp#L219 > [2] https://github.com/openjdk/jdk/blob/aa762102e9328ca76663b56b3be6f6141b044744/src/hotspot/share/services/memReporter.cpp#L397 From jbechberger at openjdk.org Mon Apr 3 08:29:55 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 3 Apr 2023 08:29:55 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v5] In-Reply-To: References: Message-ID: > Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. > > Tested on my M1 mac. Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Fix fix - Fix minor issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13144/files - new: https://git.openjdk.org/jdk/pull/13144/files/1973e005..6f1108ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13144/head:pull/13144 PR: https://git.openjdk.org/jdk/pull/13144 From jbechberger at openjdk.org Mon Apr 3 08:29:55 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 3 Apr 2023 08:29:55 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v4] In-Reply-To: References: Message-ID: On Fri, 24 Mar 2023 10:35:36 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Remove misc lines > - Disable caching in ASGCT Thanks for the review, I fixed the issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1493893275 From rehn at openjdk.org Mon Apr 3 08:33:21 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Apr 2023 08:33:21 GMT Subject: RFR: 8305247: On RISC-V generate_fixed_frame() sometimes generate a relativized locals value which is way too large. In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 12:57:23 GMT, Fredrik Bredberg wrote: > The relativized locals value is supposed to contain the distance between the frame pointer and the local variables in an interpreter frame, expressed in number of words. It typically contains the value "frame::sender_sp_offset + padding + max_locals - 1" > > On most architectures sender_sp_offset is 2. This gives us the value "1 + padding + max_locals", which is always greater or equal to 1. > > However on RISC-V the value of frame::sender_sp_offset is 0, which means that if we don't have any padding and no local variables we end up with a relativized_locals value of -1. > > When generate_fixed_frame() calculates the relativized_locals value it subtracts the frame pointer from the xlocals and then logically shifts the result right by Interpreter::logStackElementSize (to convert it into a word index). > > This works fine on all platforms (except RISC-V), because the subtraction will never become negative. But since the subtraction can end up negative on RISC-V, the shift instruction must be a arithmetic-shift-right (not a logical-shift-right) to preserve the sign and not end up with a very large positive index. > > This is currently not a real problem since the relativized_locals value is not used if max_local is zero, which is the only case the value is wrong. > > It is however a real problem when implementing JDK-8300197. > > The bug was introduced in JDK-8299795 and is fixed by changing a "srli" instruction to a "srai" in generate_fixed_frame(). Thanks ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13245#pullrequestreview-1368541565 From fyang at openjdk.org Mon Apr 3 08:40:58 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 3 Apr 2023 08:40:58 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v51] In-Reply-To: References: Message-ID: On Sun, 2 Apr 2023 21:41:47 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with four additional commits since the last revision: > > - Replace UseFastLocking with LockingMode flag > - Reject +UseFastLocking on unsupported platforms > - Merge remote-tracking branch 'tstuefe/ARM-port-8291555' into JDK-8291555-v2 > - Arm port src/hotspot/share/runtime/javaThread.cpp line 993: > 991: > 992: bool JavaThread::is_lock_owned(address adr) const { > 993: assert(!LockingMode != 2, "should not be called with new lightweight locking"); Looks like there is a typo here. I think this should be: assert(LockingMode != 2, "should not be called with new lightweight locking"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1155652106 From sspitsyn at openjdk.org Mon Apr 3 09:04:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Apr 2023 09:04:44 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v15] In-Reply-To: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> References: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> Message-ID: On Fri, 31 Mar 2023 11:18:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > fixes src/hotspot/share/prims/agentList.cpp line 204: > 202: > 203: // Invokes Agent_OnAttach for agents loaded dynamically during runtime. > 204: jint AgentList::load_agent(const char* agent_name, const char* absParam, I feel that it is better to keep the original function name "load_agent_library". As you listed there two kinds of agents: Java and Native. The function name give a hint it is native agent. Also, it is better to avoid changes that aren't really necessary. src/hotspot/share/prims/jvmtiExport.cpp line 694: > 692: } > 693: > 694: // Lookup an agent from an JvmtiEnv.Return agent only if it is not yet initialized. A space is missed after the first dot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1155051226 PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1155050084 From mdoerr at openjdk.org Mon Apr 3 09:40:56 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 3 Apr 2023 09:40:56 GMT Subject: Integrated: 8303210: [linux, Windows] Make UseSystemMemoryBarrier available as product flag In-Reply-To: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> References: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> Message-ID: On Sat, 25 Feb 2023 09:17:57 GMT, Martin Doerr wrote: > I'd like to enable UseSystemMemoryBarrier by default on supported Operating Systems in order to improve performance of thread state transitions (I/O, JNI, foreign function calls, JIT compiler threads, etc.). See JBS issue for more details. > Unfortunately, the feature was not yet implemented on all platforms. I added the code, but need the platform maintainers to check if it can be used reliably (and ideally if the performance improves). It's easy to switch it off again in case of problems. > > Update: Startup performance and some benchmarks on some platforms are impacted (see below). So, this PR no longer switches it on by default. This pull request has now been integrated. Changeset: 4de24cdb Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/4de24cdbe65289bd99eace30399f20694441f0aa Stats: 61 lines in 13 files changed: 33 ins; 10 del; 18 mod 8303210: [linux, Windows] Make UseSystemMemoryBarrier available as product flag Reviewed-by: dholmes, rehn ------------- PR: https://git.openjdk.org/jdk/pull/12753 From rkennke at openjdk.org Mon Apr 3 11:05:41 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 3 Apr 2023 11:05:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: References: Message-ID: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/1348f3bc..13c84b5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=51 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=50-51 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at amazon.de Mon Apr 3 11:09:02 2023 From: rkennke at amazon.de (Kennke, Roman) Date: Mon, 3 Apr 2023 13:09:02 +0200 Subject: RFC: JEP draft: Compact Object Headers (64 bit) (Experimental) Message-ID: <5fc36eb0-acf1-f0f1-22d6-7ebc55853220@amazon.de> Hi, We have revised the 'Lilliput' JEP draft and would like to solicit feedback from the wider HotSpot developers community: https://openjdk.org/jeps/8294992 https://bugs.openjdk.org/browse/JDK-8294992 We'll be happy to revise it further based on discussions here. We have seen some very promising and substantial improvements, both with artificial-ish benchmarks and with experiments that customers conducted. Our intention is to 'Submit' it soon (possibly by end of this week - if no objections arise). Let us know what you think! Thanks, Roman Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 From mgronlun at openjdk.org Mon Apr 3 11:54:58 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 3 Apr 2023 11:54:58 GMT Subject: RFR: 8304033: JFR: Missing thread In-Reply-To: <8VgnMZiSIQQfSwiEg9Pi3ZrJ3qIUDdXDrxOibqKrsME=.0398a1ed-6b7a-45a5-8ccb-af89b8893ddd@github.com> References: <8VgnMZiSIQQfSwiEg9Pi3ZrJ3qIUDdXDrxOibqKrsME=.0398a1ed-6b7a-45a5-8ccb-af89b8893ddd@github.com> Message-ID: <97ENgFRH0RqHNCCh7ISNCFBTv0hHJnJIYx0UFDDcoQ8=.f8fe9bea-9993-45b5-8d32-f839dacab117@github.com> On Sun, 2 Apr 2023 23:08:13 GMT, David Holmes wrote: > Seems reasonable ... though does beg the question why it wasn't done this way from the start? >From the early start, threads did not need to write their checkpoint data. This was because there was no way to consume JFR data until the entire file, "the chunk", was finalized. Finalization is under the control of the JFR Recorder Thread, and one part of finalization is traversing all threads to write their checkpoint data. With JFR Event Streaming, the chunk is always parsable; that is, we have the means to have multiple readers but a single writer. The file is written in "segments", and each segment must be fully resolvable. In this case, we must ensure that the checkpoint data is written before or to the same segment as the event. The JNI parts were overlooked. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13272#issuecomment-1494179599 From duke at openjdk.org Mon Apr 3 12:30:59 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Mon, 3 Apr 2023 12:30:59 GMT Subject: RFR: 8305247: On RISC-V generate_fixed_frame() sometimes generate a relativized locals value which is way too large In-Reply-To: <33fOKYnyNlw2qToibd1Jb8SnKQlGEQUUgjZwEcYjqaM=.e8279851-7bbf-4ac6-84e8-c9554d46df91@github.com> References: <33fOKYnyNlw2qToibd1Jb8SnKQlGEQUUgjZwEcYjqaM=.e8279851-7bbf-4ac6-84e8-c9554d46df91@github.com> Message-ID: On Mon, 3 Apr 2023 00:42:54 GMT, Fei Yang wrote: > Looks reasonable. I performed tier1-3 tests on my linux-riscv64 boards, result looks good. BTW: You might want to change the issue title removing the '.' symbol. Thanks for testing @RealFYang. I've changed the name as you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13245#issuecomment-1494234096 From rkennke at openjdk.org Mon Apr 3 12:31:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 3 Apr 2023 12:31:40 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v51] In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 08:37:51 GMT, Fei Yang wrote: > Looks like there is a typo here. I think this should be: > > ``` > assert(LockingMode != 2, "should not be called with new lightweight locking"); > ``` Thanks for catching this! I pushed a fix. You might want to try the riscv port again, I have made some changes that might have broken it. Also, you might want to compare it again with aarch64 (or x86_64), I have done a bunch of changes and improvements that you might want to adopt. Thanks! Roman ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1155894717 From rrich at openjdk.org Mon Apr 3 12:34:07 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 3 Apr 2023 12:34:07 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v5] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 23:10:28 GMT, Dean Long wrote: > BTW, I experimented with a growable nmethod oops table in JDK-8294002, but ran into problems for reasons I can't remember. I think reserving a slot per call site would be the safest approach. However I saw a potential problem with platforms like Aarch64 that appear to store the metadata in two places, the move instruction sequence itself and the relocation metadata slot. If those two places aren't kept consistent then there could be a situation where the call site oop protects one of the values but not the other. The problem is present in HEAD. Modification of Metadata references would require fixup if not stored as an immediate operands in the instruction stream as on x86. But this was never implemented (`metadata_Relocation::pd_fix_value()` is empty) since Metadata objects don't move. So if the [target Method* is cleared from a static stub](https://github.com/openjdk/jdk/blob/ecec611af6c6314d7a834392f38468ad3f390e2d/src/hotspot/share/code/compiledMethod.cpp#L661-L665) readers will see this but executing threads would get the dangling pointer because the fixup wasn't done. If holder of the target is stored in an oop slot associated with the optimized call then cleaning of static stubs can be removed because the holder will be kept alive if the caller is on stack. If it is not on stack and the holder is unloading then the caller will be unloading too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12802#issuecomment-1494238375 From duke at openjdk.org Mon Apr 3 12:41:10 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Mon, 3 Apr 2023 12:41:10 GMT Subject: Integrated: 8305247: On RISC-V generate_fixed_frame() sometimes generate a relativized locals value which is way too large In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 12:57:23 GMT, Fredrik Bredberg wrote: > The relativized locals value is supposed to contain the distance between the frame pointer and the local variables in an interpreter frame, expressed in number of words. It typically contains the value "frame::sender_sp_offset + padding + max_locals - 1" > > On most architectures sender_sp_offset is 2. This gives us the value "1 + padding + max_locals", which is always greater or equal to 1. > > However on RISC-V the value of frame::sender_sp_offset is 0, which means that if we don't have any padding and no local variables we end up with a relativized_locals value of -1. > > When generate_fixed_frame() calculates the relativized_locals value it subtracts the frame pointer from the xlocals and then logically shifts the result right by Interpreter::logStackElementSize (to convert it into a word index). > > This works fine on all platforms (except RISC-V), because the subtraction will never become negative. But since the subtraction can end up negative on RISC-V, the shift instruction must be a arithmetic-shift-right (not a logical-shift-right) to preserve the sign and not end up with a very large positive index. > > This is currently not a real problem since the relativized_locals value is not used if max_local is zero, which is the only case the value is wrong. > > It is however a real problem when implementing JDK-8300197. > > The bug was introduced in JDK-8299795 and is fixed by changing a "srli" instruction to a "srai" in generate_fixed_frame(). This pull request has now been integrated. Changeset: 33d09e58 Author: Fredrik Bredberg Committer: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/33d09e587a87e545bb3f6d21c79bf497cd056815 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8305247: On RISC-V generate_fixed_frame() sometimes generate a relativized locals value which is way too large Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/13245 From mgronlun at openjdk.org Mon Apr 3 13:02:15 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 3 Apr 2023 13:02:15 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v15] In-Reply-To: References: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> Message-ID: On Sat, 1 Apr 2023 03:47:26 GMT, Serguei Spitsyn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> fixes > > src/hotspot/share/prims/agentList.cpp line 204: > >> 202: >> 203: // Invokes Agent_OnAttach for agents loaded dynamically during runtime. >> 204: jint AgentList::load_agent(const char* agent_name, const char* absParam, > > I feel that it is better to keep the original function name "load_agent_library". As you listed there two kinds of agents: Java and Native. The function name give a hint it is native agent. Also, it is better to avoid changes that aren't really necessary. I changed the names because I found it very hard to understand what the old names represented: "AgentLibrary" vs "Library"? "add_init_agent" vs "add_instrumentation_agent", or even "add_loaded_agent"? Also a bit confusing that "load_agent_library" would also include statically linked agents - no library is loaded there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1155930115 From erikj at openjdk.org Mon Apr 3 13:16:59 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 3 Apr 2023 13:16:59 GMT Subject: RFR: JDK-8303798: REDO - Remove fdlibm C sources In-Reply-To: References: Message-ID: <7oVTjm0lBVMaZK64Qb6pH0rgOCMYcGBxI0SfahlZh6A=.cd616078-d921-4294-9a4f-3ad7ff1924c2@github.com> On Sat, 1 Apr 2023 18:08:44 GMT, Joe Darcy wrote: > This PR is a redo of JDK-8302801: Remove fdlibm C sources. The problem with JDK-8302801 was that it neglected (mea culpa) to include a Java implementation of IEEEremainder before the FDLIBM C implementation was deleted. Such an implementation has been successfully provided under JDK-8304028: Port fdlibm IEEEremainder to Java. After JDK-8304028, there are no native methods left in StrictMath. > > This PR is the same as JDK-8302801 other than StrictMath.c already being removed under JDK-8304028. Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13279#pullrequestreview-1369011040 From psandoz at openjdk.org Mon Apr 3 15:02:07 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 3 Apr 2023 15:02:07 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 12:25:16 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > small cosmetics Tier 2/3 tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1494486846 From psandoz at openjdk.org Mon Apr 3 15:20:21 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 3 Apr 2023 15:20:21 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 12:25:16 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > small cosmetics Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13093#pullrequestreview-1369262809 From psandoz at openjdk.org Mon Apr 3 16:39:01 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 3 Apr 2023 16:39:01 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v4] In-Reply-To: References: Message-ID: On Sat, 1 Apr 2023 07:44:25 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - instruction asserts > - Merge branch 'master' into sliceIntrinsics > - add comments explaining anonymous classes > - address reviews > - sse2, increase warmup > - aesthetic > - optimise 64B > - add jmh > - vector slice intrinsics With the latest PR I am observing failures with debug builds for test compiler/vectorapi/TestVectorSlice.java on both AVX512 machines and aarch64 machines. On AVX512 machines the test fails with JVM args `-XX:UseAVX=3` and `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` and results in a test assertion failure e.g., Caused by: java.lang.RuntimeException: assertEquals: expected 70 to equal 0 at jdk.test.lib.Asserts.fail(Asserts.java:594) at jdk.test.lib.Asserts.assertEquals(Asserts.java:205) at jdk.test.lib.Asserts.assertEquals(Asserts.java:189) at compiler.vectorapi.TestVectorSlice.lambda$testInts$2(TestVectorSlice.java:163) at compiler.vectorapi.TestVectorSlice.testInts(TestVectorSlice.java:181) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ... 7 more CPU flags are: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant tsc arch perfmon rep good nopl xtopology cpuid tsc known freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4 1 sse4 2 x2apic movbe popcnt tsc deadline timer aes xsave avx f16c rdrand hypervisor lahf lm abm 3dnowprefetch cpuid fault invpcid single ssbd ibrs ibpb stibp ibrs enhanced tpr shadow vnmi flexpriority ept vpid ept ad fsgsbase tsc adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt good wbnoinvd arat avx512vbmi umip pku ospke avx512 vbmi2 gfni vaes vpclmulqdq avx512 vnni avx512 bitalg avx512 vpopcntdq la57 rdpid md clear arch capabilities On aarch64 there is an IR rule failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12909#issuecomment-1494641261 From pchilanomate at openjdk.org Mon Apr 3 16:45:08 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 3 Apr 2023 16:45:08 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v13] In-Reply-To: References: <07UH4ks6EGmxIt5mZ3dNPi0YaC8u-xhBNF-Ao9iOAcA=.378b96b5-19e0-4d0a-95d8-83fd44f39024@github.com> Message-ID: On Fri, 31 Mar 2023 05:13:04 GMT, Serguei Spitsyn wrote: >> So the race I am talking about is between the main thread running finishThreads() and the launcher thread running startThreads(). The main thread could execute finishThreads() before the launcher executes startThreads(). If you comment out the two first sleeps in run_test_cycle() you can actually see the issue. Again, given that the sleeps are there it is an unlikely scheduling, but if we want to avoid depending on timing we can add that extra synchronization. > > Sorry, I understood you incorrectly. You are right, there is this kind of race here. > I've rearranges this area a little bit, and hope, it is cleaner now. > Now, both `startVirtualThreads()` and `finishVirtualThreads()` are invoked on the main thread, so they do not need to be synchronized any more. Also, the call to `ensureReady()` are moved to `finishVirtualThreads()` right before the call to `letFinish()`. Fix looks good, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13133#discussion_r1156204813 From coleenp at openjdk.org Mon Apr 3 18:06:00 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Apr 2023 18:06:00 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates [v3] In-Reply-To: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> References: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> Message-ID: On Sat, 1 Apr 2023 13:27:17 GMT, Coleen Phillimore wrote: >> The SystemDictionary is updated and read under the Compile_lock but this is unnecessary because the table is a concurrent hashtable, and the lock doesn't really synchronize any other compilation state. The lock may have protected other state that Dependencies used in the past but has been removed. See discussion in CR for more information. >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Missing null check Thanks David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13270#issuecomment-1494751217 From sspitsyn at openjdk.org Mon Apr 3 18:17:13 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Apr 2023 18:17:13 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v16] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: set java_lang_Thread::is_in_VTMS_transition bit when notifyJvmti is off ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/c55b6b38..d38e53fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=14-15 Stats: 21 lines in 5 files changed: 13 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Mon Apr 3 19:25:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Apr 2023 19:25:11 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: References: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> Message-ID: On Wed, 29 Mar 2023 22:15:35 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: updated correction of jt->jvmti_thread_state() links in VM_SetNotifyJvmtiEventsMode > > Hi Serguei, > > I took a look at the patch and looks good to me. I have a couple of comments though. > > Thanks, > Patricio @pchilano I've pushed an update where the VTMS transition bit in the VirtualThread object is laways/unconditionally set. I do not see any performance impact with this. Please, let me know about your opinion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1494852395 From kvn at openjdk.org Mon Apr 3 19:31:00 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Apr 2023 19:31:00 GMT Subject: RFR: JDK-8303798: REDO - Remove fdlibm C sources In-Reply-To: References: Message-ID: On Sat, 1 Apr 2023 18:08:44 GMT, Joe Darcy wrote: > This PR is a redo of JDK-8302801: Remove fdlibm C sources. The problem with JDK-8302801 was that it neglected (mea culpa) to include a Java implementation of IEEEremainder before the FDLIBM C implementation was deleted. Such an implementation has been successfully provided under JDK-8304028: Port fdlibm IEEEremainder to Java. After JDK-8304028, there are no native methods left in StrictMath. > > This PR is the same as JDK-8302801 other than StrictMath.c already being removed under JDK-8304028. Seems fine to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13279#pullrequestreview-1369672551 From pchilanomate at openjdk.org Mon Apr 3 20:04:09 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 3 Apr 2023 20:04:09 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v16] In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 18:17:13 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > set java_lang_Thread::is_in_VTMS_transition bit when notifyJvmti is off Marked as reviewed by pchilanomate (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13133#pullrequestreview-1369718538 From pchilanomate at openjdk.org Mon Apr 3 20:04:15 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 3 Apr 2023 20:04:15 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v9] In-Reply-To: References: <5SO5rUZwV3SQ2w7t7mOwmP1jXjUVgl4g7NiT7cKi9LU=.355314c8-03ec-4a1e-80d8-e70e98868ecc@github.com> Message-ID: On Wed, 29 Mar 2023 22:15:35 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: updated correction of jt->jvmti_thread_state() links in VM_SetNotifyJvmtiEventsMode > > Hi Serguei, > > I took a look at the patch and looks good to me. I have a couple of comments though. > > Thanks, > Patricio > @pchilano I've pushed an update where the VTMS transition bit in the VirtualThread object is laways/unconditionally set. I do not see any performance impact with this. Please, let me know about your opinion. > Latest changes look good to me. Thanks Serguei! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1494901707 From dlong at openjdk.org Mon Apr 3 20:06:59 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 3 Apr 2023 20:06:59 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates [v3] In-Reply-To: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> References: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> Message-ID: On Sat, 1 Apr 2023 13:27:17 GMT, Coleen Phillimore wrote: >> The SystemDictionary is updated and read under the Compile_lock but this is unnecessary because the table is a concurrent hashtable, and the lock doesn't really synchronize any other compilation state. The lock may have protected other state that Dependencies used in the past but has been removed. See discussion in CR for more information. >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Missing null check Looks good to me. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13270#pullrequestreview-1369723470 From coleenp at openjdk.org Mon Apr 3 20:26:02 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Apr 2023 20:26:02 GMT Subject: RFR: 8304743: Compile_lock and SystemDictionary updates [v3] In-Reply-To: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> References: <0s-UUL3GboERnmBviWMzivlx6gSHgaMfRWxQNEIN6H8=.2a86c8db-4f63-48bd-a6c6-2f1bb4af6263@github.com> Message-ID: On Sat, 1 Apr 2023 13:27:17 GMT, Coleen Phillimore wrote: >> The SystemDictionary is updated and read under the Compile_lock but this is unnecessary because the table is a concurrent hashtable, and the lock doesn't really synchronize any other compilation state. The lock may have protected other state that Dependencies used in the past but has been removed. See discussion in CR for more information. >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Missing null check Thank you Dean. Thanks for the help with this VladimirI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13270#issuecomment-1494932539 From coleenp at openjdk.org Mon Apr 3 20:30:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Apr 2023 20:30:12 GMT Subject: Integrated: 8304743: Compile_lock and SystemDictionary updates In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 13:23:31 GMT, Coleen Phillimore wrote: > The SystemDictionary is updated and read under the Compile_lock but this is unnecessary because the table is a concurrent hashtable, and the lock doesn't really synchronize any other compilation state. The lock may have protected other state that Dependencies used in the past but has been removed. See discussion in CR for more information. > Tested with tier1-8. This pull request has now been integrated. Changeset: b062b1bd Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/b062b1bd8126610d9288dc179d69e54a40b81015 Stats: 78 lines in 4 files changed: 25 ins; 42 del; 11 mod 8304743: Compile_lock and SystemDictionary updates Reviewed-by: vlivanov, dholmes, dlong ------------- PR: https://git.openjdk.org/jdk/pull/13270 From sspitsyn at openjdk.org Mon Apr 3 21:30:13 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Apr 2023 21:30:13 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v16] In-Reply-To: References: Message-ID: <5_xt-s7TgWtBNoLMLIEMEfkf6WZbbsw2HTtDgJx3sbY=.6d71e9ff-2ca5-42c1-a509-5d20ff5ca843@github.com> On Mon, 3 Apr 2023 18:17:13 GMT, Serguei Spitsyn wrote: >> The fix is to enable virtual threads support for late binding JVMTI agents. >> The fix includes: >> - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. >> - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. >> - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. >> - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> >> Testing: >> - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` >> - The originally failed tests are expected to pass now: >> `runtime/vthread/RedefineClass.java` >> `runtime/vthread/TestObjectAllocationSampleEvent.java` >> - In progress: Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > set java_lang_Thread::is_in_VTMS_transition bit when notifyJvmti is off Thank you for review, Patricio! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13133#issuecomment-1495003990 From dholmes at openjdk.org Mon Apr 3 21:33:46 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Apr 2023 21:33:46 GMT Subject: RFR: JDK-8303798: REDO - Remove fdlibm C sources In-Reply-To: References: Message-ID: On Sat, 1 Apr 2023 18:08:44 GMT, Joe Darcy wrote: > This PR is a redo of JDK-8302801: Remove fdlibm C sources. The problem with JDK-8302801 was that it neglected (mea culpa) to include a Java implementation of IEEEremainder before the FDLIBM C implementation was deleted. Such an implementation has been successfully provided under JDK-8304028: Port fdlibm IEEEremainder to Java. After JDK-8304028, there are no native methods left in StrictMath. > > This PR is the same as JDK-8302801 other than StrictMath.c already being removed under JDK-8304028. Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13279#pullrequestreview-1369828371 From sspitsyn at openjdk.org Mon Apr 3 21:57:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Apr 2023 21:57:46 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v17] In-Reply-To: References: Message-ID: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - Merge - set java_lang_Thread::is_in_VTMS_transition bit when notifyJvmti is off - minor simplification in ToggleNotifyJvmtiTest.java - review: addressed next round of review suggestions - review: tweak in count_transitions_and_correct_jvmti_thread_states - review: minor tweak in test - one more review round fixes - refactored jt->jvmti_thread_state() corrections in VM_SetNotifyJvmtiEventsMode - review: updated correction of jt->jvmti_thread_state() links in VM_SetNotifyJvmtiEventsMode - fixed trailing spaces in two files - ... and 10 more: https://git.openjdk.org/jdk/compare/5c31a0bf...b5624011 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/d38e53fd..b5624011 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=15-16 Stats: 27373 lines in 728 files changed: 13694 ins; 9290 del; 4389 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From darcy at openjdk.org Tue Apr 4 00:00:15 2023 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 4 Apr 2023 00:00:15 GMT Subject: Integrated: JDK-8303798: REDO - Remove fdlibm C sources In-Reply-To: References: Message-ID: On Sat, 1 Apr 2023 18:08:44 GMT, Joe Darcy wrote: > This PR is a redo of JDK-8302801: Remove fdlibm C sources. The problem with JDK-8302801 was that it neglected (mea culpa) to include a Java implementation of IEEEremainder before the FDLIBM C implementation was deleted. Such an implementation has been successfully provided under JDK-8304028: Port fdlibm IEEEremainder to Java. After JDK-8304028, there are no native methods left in StrictMath. > > This PR is the same as JDK-8302801 other than StrictMath.c already being removed under JDK-8304028. This pull request has now been integrated. Changeset: ccbb0e8d Author: Joe Darcy URL: https://git.openjdk.org/jdk/commit/ccbb0e8d8927dff5a424717616468d05015cd002 Stats: 6516 lines in 64 files changed: 20 ins; 6486 del; 10 mod 8303798: REDO - Remove fdlibm C sources Reviewed-by: alanb, iris, jwaters, erikj, kvn, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13279 From david.holmes at oracle.com Tue Apr 4 00:46:58 2023 From: david.holmes at oracle.com (David Holmes) Date: Tue, 4 Apr 2023 10:46:58 +1000 Subject: [PATCH] Added support for fractional values of SafepointTimeoutDelay In-Reply-To: References: Message-ID: <0c36cead-3e76-e2d0-4ee3-66e02f0272bf@oracle.com> Hi Wojciech, Welcome to OpenJDK! I have filed: https://bugs.openjdk.org/browse/JDK-8305506 for this issue. Are you able to create a Pull Request on github for this? Thanks, David On 3/04/2023 5:35 pm, Wojciech KUDLA wrote: > Hi everyone, > > My organization uses Java to run latency-sensitive workloads; we often engage in performance troubleshooting of the compiled code or the VM itself. We typically avoid STW altogether but when a safepoint happens it's always a sub-millisecond pause. The SafepointTimeoutDelay is great for quickly identifying threads that failed to park themselves in a timely manner and are causing long time-to-safepoint issues but the functionality works with millisecond granularity so doesn't cover our use cases at all. We have a very small patch that introduces sub-millisecond granularity that we'd like to share with the community. > We have just signed the OCA and this would be our first contribution so we'd appreciate if someone could sponsor this. > > The change is very straightforward but to preserve backwards compatibility we decided to replace the type of SafepointTimeoutDelay with double which I imagine might the topic of various opinions clashing. We're very happy to engage in these discussions and shape the patch to the community's standards. > Here it is in its entirety: > > --- > src/hotspot/share/runtime/globals.hpp | 7 ++++--- > src/hotspot/share/runtime/safepoint.cpp | 4 ++-- > src/hotspot/share/utilities/globalDefinitions.hpp | 3 +++ > 3 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp > index 483287e6262..d694758829b 100644 > --- a/src/hotspot/share/runtime/globals.hpp > +++ b/src/hotspot/share/runtime/globals.hpp > @@ -1289,9 +1289,10 @@ const int ObjectAlignmentInBytes = 8; > "(0 means none)") \ > range(0, max_jint) \ > \ > - product(intx, SafepointTimeoutDelay, 10000, \ > - "Delay in milliseconds for option SafepointTimeout") \ > - range(0, max_intx LP64_ONLY(/MICROUNITS)) \ > + product(double, SafepointTimeoutDelay, 10000, \ > + "Delay in milliseconds for option SafepointTimeout; " \ > + "supports sub-millisecond resolution with fractional values.") \ > + range(0, max_jlongDouble LP64_ONLY(/MICROUNITS)) \ > \ > product(bool, UseSystemMemoryBarrier, false, EXPERIMENTAL, \ > "Try to enable system memory barrier") \ > diff --git a/src/hotspot/share/runtime/safepoint.cpp b/src/hotspot/share/runtime/safepoint.cpp > index 2ff593a0143..7a27aaf804c 100644 > --- a/src/hotspot/share/runtime/safepoint.cpp > +++ b/src/hotspot/share/runtime/safepoint.cpp > @@ -379,7 +379,7 @@ void SafepointSynchronize::begin() { > if (SafepointTimeout) { > // Set the limit time, so that it can be compared to see if this has taken > // too long to complete. > - safepoint_limit_time = SafepointTracing::start_of_safepoint() + (jlong)SafepointTimeoutDelay * (NANOUNITS / MILLIUNITS); > + safepoint_limit_time = SafepointTracing::start_of_safepoint() + > + (jlong)SafepointTimeoutDelay * NANOSECS_PER_MILLISEC; > timeout_error_printed = false; > } > > @@ -795,7 +795,7 @@ void SafepointSynchronize::print_safepoint_timeout() { > os::naked_sleep(3000); > } > } > - fatal("Safepoint sync time longer than " INTX_FORMAT "ms detected when executing %s.", > + fatal("Safepoint sync time longer than " JDOUBLE_FORMAT_P(6) "ms > + detected when executing %s.", > SafepointTimeoutDelay, VMThread::vm_operation()->name()); > } > } > diff --git a/src/hotspot/share/utilities/globalDefinitions.hpp b/src/hotspot/share/utilities/globalDefinitions.hpp > index 41ff5150243..1570fde7477 100644 > --- a/src/hotspot/share/utilities/globalDefinitions.hpp > +++ b/src/hotspot/share/utilities/globalDefinitions.hpp > @@ -151,6 +151,9 @@ class oopDesc; > #define UINTX_FORMAT_X "0x%" PRIxPTR > #define UINTX_FORMAT_W(width) "%" #width PRIuPTR > > +// Format jdouble with defined precision #define > +JDOUBLE_FORMAT_P(precision) "%." #precision "f" > + > // Format jlong, if necessary > #ifndef JLONG_FORMAT > #define JLONG_FORMAT INT64_FORMAT > -- > > Kind regards > > Wojciech Kudla > HSBC Bank plc > > > PUBLIC > > ----------------------------------------- > SAVE PAPER - THINK BEFORE YOU PRINT! > > This E-mail is confidential. > > It may also be legally privileged. If you are not the addressee you may not copy, > forward, disclose or use any part of it. If you have received this message in error, > please delete it and all copies from your system and notify the sender immediately by > return E-mail. > > Internet communications cannot be guaranteed to be timely secure, error or virus-free. > The sender does not accept liability for any errors or omissions. From sspitsyn at openjdk.org Tue Apr 4 00:50:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 4 Apr 2023 00:50:41 GMT Subject: Integrated: 8297286: runtime/vthread tests crashing after JDK-8296324 In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 02:12:12 GMT, Serguei Spitsyn wrote: > The fix is to enable virtual threads support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. This pull request has now been integrated. Changeset: a1a9ec6e Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/a1a9ec6e46b70d5436711f89f4bf603ebacc8060 Stats: 574 lines in 15 files changed: 554 ins; 9 del; 11 mod 8297286: runtime/vthread tests crashing after JDK-8296324 Reviewed-by: lmesnik, pchilanomate, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/13133 From david.holmes at oracle.com Tue Apr 4 00:50:52 2023 From: david.holmes at oracle.com (David Holmes) Date: Tue, 4 Apr 2023 10:50:52 +1000 Subject: [PATCH] Added support for grace period before AbortVMOnSafepointTimeout triggers In-Reply-To: References: Message-ID: Hi Wojciech, I have filed: https://bugs.openjdk.org/browse/JDK-8305507 for this issue. Please create a PR on github if you are able. Thanks, David On 3/04/2023 5:56 pm, Wojciech KUDLA wrote: > Hi everyone, > > Our bank uses Java for low-latency applications and we sometimes need to dig into a long time-to-safepoint pause in greater detail. AbortVMOnSafepointTimeout is extremely useful but we want to avoid putting the JVM in debug while it is still warming up or bootstrapping an application. For that reason we introduced a grace period that avoids triggering the AbortVMOnSafepointTimeout functionality until some preconfigured time after the JVM startup. > The patch is extremely simple and tangential to the SafepointTimeoutDelay patch we submitted this morning:https://mail.openjdk.org/pipermail/hotspot-dev/2023-April/072455.html > As stated in that thread - we recently signed the OCA and this is the second of two patches we'd like to contribute; I think it will require someone to sponsor it. > We're happy to continue discussing this patch here to ensure the change meets the community's requirements. > > --- > src/hotspot/share/runtime/globals.hpp | 5 +++++ > src/hotspot/share/runtime/safepoint.cpp | 3 ++- > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp > index 6006063421e..05ff1228050 100644 > --- a/src/hotspot/share/runtime/globals.hpp > +++ b/src/hotspot/share/runtime/globals.hpp > @@ -426,6 +426,11 @@ const int ObjectAlignmentInBytes = 8; > product(bool, AbortVMOnSafepointTimeout, false, DIAGNOSTIC, \ > "Abort upon failure to reach safepoint (see SafepointTimeout)") \ > \ > + product(uint64_t, AbortVMOnSafepointTimeoutDelay, 0, DIAGNOSTIC, \ > + "Enable option AbortVMOnSafepointTimeout after this many " \ > + "milliseconds since JVM startup") \ > + range(0, max_jlong) \ > + \ > product(bool, AbortVMOnVMOperationTimeout, false, DIAGNOSTIC, \ > "Abort upon failure to complete VM operation promptly") \ > \ > diff --git a/src/hotspot/share/runtime/safepoint.cpp b/src/hotspot/share/runtime/safepoint.cpp > index 2ff593a0143..42d41f74d5d 100644 > --- a/src/hotspot/share/runtime/safepoint.cpp > +++ b/src/hotspot/share/runtime/safepoint.cpp > @@ -67,6 +67,7 @@ > #include "runtime/threadSMR.hpp" > #include "runtime/threadWXSetters.inline.hpp" > #include "runtime/timerTrace.hpp" > +#include "services/management.hpp" > #include "services/runtimeService.hpp" > #include "utilities/events.hpp" > #include "utilities/macros.hpp" > @@ -784,7 +785,7 @@ void SafepointSynchronize::print_safepoint_timeout() { > > // To debug the long safepoint, specify both AbortVMOnSafepointTimeout & > // ShowMessageBoxOnError. > - if (AbortVMOnSafepointTimeout) { > + if (AbortVMOnSafepointTimeout && Management::ticks_to_ms(os::elapsed_counter()) > (jlong)AbortVMOnSafepointTimeoutDelay) { > // Send the blocking thread a signal to terminate and write an error file. > for (JavaThreadIteratorWithHandle jtiwh; JavaThread *cur_thread = jtiwh.next(); ) { > if (cur_thread->safepoint_state()->is_running()) { > -- From qamai at openjdk.org Tue Apr 4 00:59:21 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 4 Apr 2023 00:59:21 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 12:25:16 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > small cosmetics Thanks, may I integrate the changes now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1495191197 From psandoz at openjdk.org Tue Apr 4 01:10:18 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 4 Apr 2023 01:10:18 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 00:56:09 GMT, Quan Anh Mai wrote: > Thanks, may I integrate the changes now? You might need another HotSpot reviewer? @vnkozlov is that correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1495198225 From xgong at openjdk.org Tue Apr 4 02:40:12 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 4 Apr 2023 02:40:12 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v4] In-Reply-To: References: Message-ID: On Sat, 1 Apr 2023 07:44:25 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - instruction asserts > - Merge branch 'master' into sliceIntrinsics > - add comments explaining anonymous classes > - address reviews > - sse2, increase warmup > - aesthetic > - optimise 64B > - add jmh > - vector slice intrinsics src/hotspot/share/opto/vectorIntrinsics.cpp line 1935: > 1933: return false; // should be primitive type > 1934: } > 1935: BasicType elem_bt = elem_type->basic_type(); Code style: It's better to add a blank line between different blocks. src/hotspot/share/opto/vectorIntrinsics.cpp line 1941: > 1939: if (C->print_intrinsics()) { > 1940: tty->print_cr(" ** not supported: arity=2 op=slice vlen=%d etype=%s ismask=notused", > 1941: num_elem, type2name(elem_bt)); `ismask=notused` could be removed. We used `ismask` in other intrinsics to print whether it is a vector mask operation instead of vector class. src/hotspot/share/opto/vectorIntrinsics.cpp line 1954: > 1952: if (v1 == NULL || v2 == NULL) { > 1953: return false; // operand unboxing failed > 1954: } Suggest to reorder line-1950 and the if-statement in line-1952. And then we doesn't need too more spaces in the variable definition `Node* v1 = unbox_vector(xxx)`. Besides, could we rename variable `o` to `index` or `origin` ? I know you'v used `origin` at the begin, maybe we can rename it to `origin_type`. I see the similari name style in `inline_vector_frombits_coerced`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1156645453 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1156646311 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1156653402 From kim.barrett at oracle.com Tue Apr 4 03:02:21 2023 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 4 Apr 2023 03:02:21 +0000 Subject: ALLOW_C_FUNCTION's first parameter doesn't seem very helpful In-Reply-To: References: Message-ID: <010B73C4-8F55-443E-9C2D-9E3E38B289AE@oracle.com> > On Apr 2, 2023, at 10:26 AM, Julian Waters wrote: > > Hi everyone, > > I couldn't help but notice that the method name passed to the macro to allow use of itself isn't really needed in any way. Comments in the macro definition say it's to show exactly what method is being permitted by the macro, but to me it seems the actual call to it that happens in the second half of the macro already does that job entirely > > // It's already obvious from the actual method call itself that we want std::_Exit > ALLOW_C_FUNCTION(::_Exit, ::_Exit(code);) > > Is there a reason we still keep the first parameter around other than for documentation purposes? The using-statement is not always a simple call expression. (We have many that aren't.) There could be some other function call that is directly providing an argument to the function being permitted. Or the use being permitted could be directly providing an argument to some other function. In such statements, which function is being allowed? (I think there are currently no uses of either of those argument passing forms. Also, restructuring such code could probably dodge that question.) More importantly, some implementation approaches I tried for some platforms needed that information. I didn't get any such to work, but maybe someone more clever will find a way. Given that there really shouldn't be a lot of these, I'm not too worried about making the syntax easy to use. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From dholmes at openjdk.org Tue Apr 4 04:54:08 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Apr 2023 04:54:08 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v5] In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 08:29:55 GMT, Johannes Bechberger wrote: >> Fixes the issue by transitioning the thread into the WXWrite mode while walking the stack in AsyncGetCallTrace. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix fix > - Fix minor issues Can you update the description please to reflect the final approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1495340463 From dholmes at openjdk.org Tue Apr 4 06:49:50 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Apr 2023 06:49:50 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v35] In-Reply-To: References: <-xn2Y5lZ-63av1QhbJNDx4saAzgmFT9hE394alHUFHI=.ae58b01f-d6ef-45f3-8e42-fc7b7a552b8e@github.com> Message-ID: On Thu, 30 Mar 2023 14:30:28 GMT, Roman Kennke wrote: >> Please explain why you think this is "not safe". Yes, you can observe state that is in >> the process of changing, but do you think that we'll see a crash with allowing >> `Threads::owning_thread_from_object()` to be called from a non-safepoint place? > > I don't think we'd see a crash, but we might get false results when we are scanning the lock-stack of a foreign thread, when that thread does not hold still. I'm not even comfortable doing that cross-stack lock query with the old code. Given the owner could release the monitor the moment after we check I don't see how false results are an issue here. The existing code should be safe when not executed at a safepoint.. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156792723 From dholmes at openjdk.org Tue Apr 4 06:49:49 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Apr 2023 06:49:49 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> Message-ID: On Mon, 3 Apr 2023 11:05:41 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Thanks for previous updates - much appreciated. Another round of comments and some concerns. One thing I can't quite get clear in my head is whether the small window where an object's monitor is inflated and the object is still in the thread's lock-stack, could cause an issue for any external observers trying to determine the object's locked state. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9739: > 9737: get_thread(thread); > 9738: #endif > 9739: subl(Address(thread, JavaThread::lock_stack_top_offset()), oopSize); Is this code used for monitorexit or only returning from synchronized methods? If used for monitorexit there is no requirement that the monitor being unlocked was the last monitor locked. Balanced locking only requires the locks and unlocks are matched, not that they are perfectly nested ie. this is valid in bytecode: monitorenter A monitorenter B monitorexit A monitorExit B src/hotspot/share/runtime/javaThread.cpp line 1388: > 1386: } > 1387: > 1388: if (!UseHeavyMonitors && LockingMode == 2) { Given UseHeavyMonitors implies LockingMode==0 it suffices to just check LockingMode==2 here. src/hotspot/share/runtime/lockStack.inline.hpp line 53: > 51: if (!thread->is_Java_thread()) { > 52: return false; > 53: } I'm still unclear how we can have non-JavaThreads here. Only JavaThreads can lock object monitors. src/hotspot/share/runtime/synchronizer.cpp line 1283: > 1281: inf->set_owner_from_anonymous(current); > 1282: assert(current->is_Java_thread(), "must be Java thread"); > 1283: JavaThread::cast(current)->lock_stack().remove(object); JavaThread::cast already asserts it is a JavaThread. src/hotspot/share/runtime/synchronizer.cpp line 1314: > 1312: LogStreamHandle(Trace, monitorinflation) lsh; > 1313: if (mark.is_fast_locked()) { > 1314: assert(LockingMode == 2, "can only happen with new lightweight locking"); I'd rather see this entire case guarded by "`if (LockingMode ==2)`" and have `is_fast_locked()` assert if called in any other mode. Similarly for the stack-locked case below. src/hotspot/share/runtime/synchronizer.cpp line 1330: > 1328: // Success! Return inflated monitor. > 1329: if (own) { > 1330: assert(current->is_Java_thread(), "must be: checked in is_lock_owned()"); Again this assert is not needed as `JavaThread::cast` will perform it. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/JavaVFrame.java line 85: > 83: ( // we have marked ourself as pending on this monitor > 84: mark.monitor().equals(thread.getCurrentPendingMonitor()) || > 85: mark.monitor().isOwnedAnonymous() || Not at all clear to me how this fits here. ?? ------------- PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1370203589 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156763358 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156767200 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156736240 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156785509 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156787971 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156789340 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156797838 From rrich at openjdk.org Tue Apr 4 07:24:26 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 4 Apr 2023 07:24:26 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v5] In-Reply-To: References: Message-ID: On Wed, 29 Mar 2023 19:54:50 GMT, Richard Reingruber wrote: >> This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. >> >> The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. >> >> C2i entry barriers can be removed for the same reason. >> >> Testing: >> >> Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. >> >> I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. >> >> I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Check only in debug vm > - Merge branch 'master' > - Merge branch 'master' > - Fix > - Feedback Coleene > - Adding TestStaticallyBoundTargetIsReachable.java > - Path to target exists also if the receiver is a constant of the caller > - Use nmethod::oops_do() to search for to_holder in from_nm > - Merge branch 'master' > - Remove MacroAssembler::resolve_weak_handle() > - ... and 4 more: https://git.openjdk.org/jdk/compare/ff368d50...7dd06446 Closing this PR for now (with the assumption that comments are still possible). ------------- PR Comment: https://git.openjdk.org/jdk/pull/12802#issuecomment-1495472376 From rrich at openjdk.org Tue Apr 4 07:24:27 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 4 Apr 2023 07:24:27 GMT Subject: Withdrawn: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12802 From dholmes at openjdk.org Tue Apr 4 07:28:10 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Apr 2023 07:28:10 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v5] In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 08:29:55 GMT, Johannes Bechberger wrote: >> Fixes the issue by disabling PCDesc cache modifications when in ASGCT. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix fix > - Fix minor issues Can I just clarify the new approach please. IIUC if we update the PcDesc cache whilst within ASGCT then we hit code that needs WXWrite mode. So the fix is to just skip updating the cache if ASGCT is active. I have no idea what the PcDesc cache is but I presume if it is really a cache then not updating it only potentially affects lookup performance - right? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1495349989 From jbechberger at openjdk.org Tue Apr 4 07:32:10 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 4 Apr 2023 07:32:10 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v5] In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 08:29:55 GMT, Johannes Bechberger wrote: >> Fixes the issue by disabling PCDesc cache modifications when in ASGCT. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix fix > - Fix minor issues You're right and I updated the description of the PR. I gave some performance figures in a previous comment, they show that the performance impact is probably negligible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1495482316 From haosun at openjdk.org Tue Apr 4 08:09:04 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 4 Apr 2023 08:09:04 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret Message-ID: ### Background 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. ### Goal This patch aims to make PAC-RET compatible with virtual threads. ### Requirements of virtual threads R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. Note that more details can be found in the discussion [3]. ### Investigation We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of some exception scenarios (Recall the reason why we fail to use SP as the modifier). Finally, we choose to use value zero as the modifier. Trivially, it's compatible with virtual threads. However, compared to FP modifier, this solution would reduce the strength of PAC-RET protection to some extent. E.g., you get the same authentication code for each call to the function, whereas using FP gives you different codes as long as the stack depth is different. ### Implementation of Zero modifier Here list the key updates of this patch. 1. vm_version_aarch64.cpp Remove the constraint on "enable-preview" and "PreserveFramePointer". 2. macroAssembler_aarch64.cpp For utility protect_return_address(), 1) use PACIAZ/PACIZA instructions directly. 2) argument "temp_reg" is removed since all functions use the same modifier. 3) all the use sites are updated accordingly. This involves the updates in many files. Similar updates are done to utility authenticate_return_address(). Besides, aarch64.ad and AArch64TestAssembler.java are updated accordingly. 3. pauth_linux_aarch64.inline.hpp For utilities pauth_sign_return_address() and pauth_authenticate_return_address(), remove the second argument and pass value zero to r16 register. Similarly, all the use sites are updated as well. This involves the updates in many files. 6. continuationHelper_aarch64.inline.hpp Introduce return_pc_at() and patch_pc_at() to avoid directly reading the saved PC or writing new signed PC on the stack in shared code. 7. Minor updates (1) sharedRuntime_aarch64.cpp: Add the missing authenticate_return_address() use for function gen_continuation_enter(). In functions generate_deopt_blob() and generate_uncommon_trap_blob(), remove the authentication on the caller (3) frame since the return address is not used. (2) stubGenerator_aarch64.cpp: Add the missing authenticate_return_address() use for function generate_cont_thaw(). (3) runtime.cpp: enable the authentication. ### Test 1. Cross compilations on arm32/s390/ppc/riscv passed. 2. zero build and x86 build passed. 3. tier1~3 passed on Linux/AArch64 w/ and w/o PAC-RET. Co-Developed-by: Nick Gasson [1] https://bugs.openjdk.org/browse/JDK-8277204 [2] https://openjdk.org/jeps/425 [3] https://github.com/openjdk/jdk/pull/9067 [4] https://bugs.openjdk.org/browse/JDK-8288023 [5] https://bugs.openjdk.org/browse/JDK-8301819 [6] https://openjdk.org/jeps/444 [7] https://www.usenix.org/conference/usenixsecurity21/presentation/liljestrand [8] https://github.com/openjdk/jdk/pull/10441 ------------- Commit messages: - 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret Changes: https://git.openjdk.org/jdk/pull/13322/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8287325 Stats: 180 lines in 29 files changed: 68 ins; 27 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From stuefe at openjdk.org Tue Apr 4 08:20:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Apr 2023 08:20:44 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> Message-ID: <2f4eLmsH_dQMV6eQfr1PR0nOIjWWMqe2LDzIk48kBHw=.7bd76dee-a18a-426f-8ea9-cd17e0717352@github.com> On Tue, 4 Apr 2023 05:54:09 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9739: > >> 9737: get_thread(thread); >> 9738: #endif >> 9739: subl(Address(thread, JavaThread::lock_stack_top_offset()), oopSize); > > Is this code used for monitorexit or only returning from synchronized methods? If used for monitorexit there is no requirement that the monitor being unlocked was the last monitor locked. Balanced locking only requires the locks and unlocks are matched, not that they are perfectly nested ie. this is valid in bytecode: > > monitorenter A > monitorenter B > monitorexit A > monitorExit B That is done one layer up in InterpreterMacroAssembler::unlock_object. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1156895293 From aph at openjdk.org Tue Apr 4 09:01:09 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 4 Apr 2023 09:01:09 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 08:00:20 GMT, Hao Sun wrote: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of some exception scenarios (Recall the reason why we fail to use SP as the modifier). > > Finally, we choose to use value zero as the modifier. Trivially, it's compatible with virtual threads. However, compared to FP modifier, this solution would reduce the strength of PAC-RET protection to some extent. E.g., you get the same authentication code for each call to the function, whereas using FP gives you different codes as long as the stack depth is different. > > ### Implementation of Zero modifier > > Here list the key updates of this patch. > > 1. vm_version_aarch64.cpp > > Remove the constraint on "enable-preview" and "PreserveFramePointer". > > 2. macroAssembler_aarch64.cpp > > For utility protect_return_address(), 1) use PACIAZ/PACIZA instructions directly. 2) argument "temp_reg" is removed since all functions use the same modifier. 3) all the use sites are updated accordingly. This involves the updates in many files. > > Similar updates are done to utility authenticate_return_address(). > > Besides, aarch64.ad and AArch64TestAssembler.java are updated accordingly. > > 3. pauth_linux_aarch64.inline.hpp > > For utilities pauth_sign_return_address() and > pauth_authenticate_return_address(), remove the second argument and pass value zero to r16 register. > > Similarly, all the use sites are updated as well. This involves the updates in many files. > > 6. continuationHelper_aarch64.inline.hpp > > Introduce return_pc_at() and patch_pc_at() to avoid directly reading the saved PC or writing new signed PC on the stack in shared code. > > 7. Minor updates > > (1) sharedRuntime_aarch64.cpp: Add the missing > authenticate_return_address() use for function gen_continuation_enter(). In functions generate_deopt_blob() and generate_uncommon_trap_blob(), remove the authentication on the caller (3) frame since the return address is not used. > > (2) stubGenerator_aarch64.cpp: Add the missing > authenticate_return_address() use for function generate_cont_thaw(). > > (3) runtime.cpp: enable the authentication. > > ### Test > > 1. Cross compilations on arm32/s390/ppc/riscv passed. > 2. zero build and x86 build passed. > 3. tier1~3 passed on Linux/AArch64 w/ and w/o PAC-RET. > > Co-Developed-by: Nick Gasson > > [1] https://bugs.openjdk.org/browse/JDK-8277204 > [2] https://openjdk.org/jeps/425 > [3] https://github.com/openjdk/jdk/pull/9067 > [4] https://bugs.openjdk.org/browse/JDK-8288023 > [5] https://bugs.openjdk.org/browse/JDK-8301819 > [6] https://openjdk.org/jeps/444 > [7] https://www.usenix.org/conference/usenixsecurity21/presentation/liljestrand > [8] https://github.com/openjdk/jdk/pull/10441 src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp line 72: > 70: > 71: inline address ContinuationHelper::return_pc_at(intptr_t* sp) { > 72: return pauth_strip_pointer(*(address*)sp); This is the return address. it's called `return_address` elsewhere. src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp line 76: > 74: > 75: inline void ContinuationHelper::patch_pc_at(intptr_t* sp, address pc) { > 76: *(address*)sp = pauth_sign_return_address(pc); This is a bad name. We're not patching the PC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1156944913 PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1156942419 From mgronlun at openjdk.org Tue Apr 4 10:58:15 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 4 Apr 2023 10:58:15 GMT Subject: RFR: 8304033: JFR: Missing thread In-Reply-To: <8VgnMZiSIQQfSwiEg9Pi3ZrJ3qIUDdXDrxOibqKrsME=.0398a1ed-6b7a-45a5-8ccb-af89b8893ddd@github.com> References: <8VgnMZiSIQQfSwiEg9Pi3ZrJ3qIUDdXDrxOibqKrsME=.0398a1ed-6b7a-45a5-8ccb-af89b8893ddd@github.com> Message-ID: On Sun, 2 Apr 2023 23:08:13 GMT, David Holmes wrote: >> Greetings, >> >> please help review this small adjustment to fix the lack of thread information in certain situations, more specifically associated with JNI_AttachThread and JNI_DetachThread. The old site posting a thread start event is correct in getting the correct thread id, but the thread does not write its checkpoint at that location, which is required after JFR Event Streaming. >> >> The fix is to let the sites in jni.cpp go through the "normal" thread start entry point. >> >> Testing: jdk_jfr >> >> Thanks >> Markus > > Seems reasonable ... though does beg the question why it wasn't done this way from the start? Thanks @dholmes-ora and @egahlin for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13272#issuecomment-1495760300 From mgronlun at openjdk.org Tue Apr 4 10:58:17 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 4 Apr 2023 10:58:17 GMT Subject: Integrated: 8304033: JFR: Missing thread In-Reply-To: References: Message-ID: <8mQxHmx_HFtIbXQYKSPjaomfncJ8vo_wVQWNxX3K8-M=.c1764fe6-40d9-4986-8838-21c4706b1af4@github.com> On Fri, 31 Mar 2023 16:42:55 GMT, Markus Gr?nlund wrote: > Greetings, > > please help review this small adjustment to fix the lack of thread information in certain situations, more specifically associated with JNI_AttachThread and JNI_DetachThread. The old site posting a thread start event is correct in getting the correct thread id, but the thread does not write its checkpoint at that location, which is required after JFR Event Streaming. > > The fix is to let the sites in jni.cpp go through the "normal" thread start entry point. > > Testing: jdk_jfr > > Thanks > Markus This pull request has now been integrated. Changeset: 7ca2aec3 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/7ca2aec34c9b15227570893d9577b306095de40e Stats: 25 lines in 1 file changed: 3 ins; 20 del; 2 mod 8304033: JFR: Missing thread Reviewed-by: egahlin, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13272 From dholmes at openjdk.org Tue Apr 4 12:28:47 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Apr 2023 12:28:47 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: <2f4eLmsH_dQMV6eQfr1PR0nOIjWWMqe2LDzIk48kBHw=.7bd76dee-a18a-426f-8ea9-cd17e0717352@github.com> References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> <2f4eLmsH_dQMV6eQfr1PR0nOIjWWMqe2LDzIk48kBHw=.7bd76dee-a18a-426f-8ea9-cd17e0717352@github.com> Message-ID: On Tue, 4 Apr 2023 08:17:14 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9739: >> >>> 9737: get_thread(thread); >>> 9738: #endif >>> 9739: subl(Address(thread, JavaThread::lock_stack_top_offset()), oopSize); >> >> Is this code used for monitorexit or only returning from synchronized methods? If used for monitorexit there is no requirement that the monitor being unlocked was the last monitor locked. Balanced locking only requires the locks and unlocks are matched, not that they are perfectly nested ie. this is valid in bytecode: >> >> monitorenter A >> monitorenter B >> monitorexit A >> monitorExit B > > That is done one layer up in InterpreterMacroAssembler::unlock_object. Thanks @tstuefe . I see at that level if the object doesn't match the top of the lock-stack then we take the slow path. But then I'm lost - AFAICS the slow path is `InterpreterRuntime::monitorexit` and that doesn't have any fast-locking code in it at all ??? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157169176 From dholmes at openjdk.org Tue Apr 4 12:31:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Apr 2023 12:31:13 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v5] In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 08:29:55 GMT, Johannes Bechberger wrote: >> Fixes the issue by disabling PCDesc cache modifications when in ASGCT. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix fix > - Fix minor issues Nothing further from me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13144#pullrequestreview-1370866393 From stuefe at openjdk.org Tue Apr 4 13:01:04 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Apr 2023 13:01:04 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> Message-ID: On Mon, 3 Apr 2023 11:05:41 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Can we have named constants for LockingMode, please? Something grepable? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1495928668 From stuefe at openjdk.org Tue Apr 4 13:01:04 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Apr 2023 13:01:04 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> <2f4eLmsH_dQMV6eQfr1PR0nOIjWWMqe2LDzIk48kBHw=.7bd76dee-a18a-426f-8ea9-cd17e0717352@github.com> Message-ID: On Tue, 4 Apr 2023 12:25:19 GMT, David Holmes wrote: >> That is done one layer up in InterpreterMacroAssembler::unlock_object. > > Thanks @tstuefe . I see at that level if the object doesn't match the top of the lock-stack then we take the slow path. But then I'm lost - AFAICS the slow path is `InterpreterRuntime::monitorexit` and that doesn't have any fast-locking code in it at all ??? I'm not sure what you mean. `InterpreterRuntime::monitorexit` will enter `ObjectSynchronizer::exit` which handles the fast-locking case under `if (LockingMode == 2)...`. Or am I misunderstanding you? (I really wish for named constants instead of `1` and `2` constants though...) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157209193 From qamai at openjdk.org Tue Apr 4 13:24:18 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 4 Apr 2023 13:24:18 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v5] In-Reply-To: References: Message-ID: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add identity, fix flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12909/files - new: https://git.openjdk.org/jdk/pull/12909/files/bedb73bd..e68e215d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=03-04 Stats: 42 lines in 4 files changed: 19 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/12909.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12909/head:pull/12909 PR: https://git.openjdk.org/jdk/pull/12909 From qamai at openjdk.org Tue Apr 4 13:46:12 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 4 Apr 2023 13:46:12 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12909/files - new: https://git.openjdk.org/jdk/pull/12909/files/e68e215d..a17942f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=04-05 Stats: 13 lines in 1 file changed: 4 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12909.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12909/head:pull/12909 PR: https://git.openjdk.org/jdk/pull/12909 From mgronlun at openjdk.org Tue Apr 4 14:39:19 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 4 Apr 2023 14:39:19 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initializationTime = 12:31:15.574 (2023-03-08) > initializationDuration = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initializationTime = 12:31:31.037 (2023-03-08) > initializationDuration = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initializationTime = 12:31:36.142 (2023-03-08) > initializationDuration = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: renames ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/07407a82..d0fd9e97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=14-15 Stats: 2158 lines in 19 files changed: 1059 ins; 1059 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From coleenp at openjdk.org Tue Apr 4 14:40:03 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Apr 2023 14:40:03 GMT Subject: RFR: 8305509: C1 fails "assert(k != nullptr) failed: illegal use of unloaded klass" Message-ID: I tested the fix for JDK-8304743 on tier1-8, then did this refactoring which required a ciKlass::get_Klass() call that failed. This fix reverts the refactoring. Tested with tier4. ------------- Commit messages: - 8305509: C1 fails "assert(k != nullptr) failed: illegal use of unloaded klass" Changes: https://git.openjdk.org/jdk/pull/13327/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13327&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305509 Stats: 44 lines in 4 files changed: 15 ins; 24 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13327.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13327/head:pull/13327 PR: https://git.openjdk.org/jdk/pull/13327 From mgronlun at openjdk.org Tue Apr 4 14:44:13 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 4 Apr 2023 14:44:13 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v15] In-Reply-To: References: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> Message-ID: <96JFUTmjTmM45mitEipdK8x2jqjo2uhqF42031X6LRQ=.11d4b13f-7d95-426a-8fc9-50e7dfb8f8ab@github.com> On Sat, 1 Apr 2023 03:31:48 GMT, Serguei Spitsyn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> fixes > > src/hotspot/share/prims/agent.hpp line 1: > >> 1: /* > > The name for class and file is too general. > I'm thinking if renaming the files to jvmtiAgent and the class to JvmtiAgent would work. > In general, there exists a convention to name JVMTI file with the "jvmti" prefix. > It is a gray zone between Runtime and JVMTI but seems to belong more to JVMTI. > The same about the AgentList class and file. > Also, these new files are good candidates to add here: > > make/hotspot/lib/JvmFeatures.gmk: > ifneq ($(call check-jvm-feature, jvmti), true) > JVM_CFLAGS_FEATURES += -DINCLUDE_JVMTI=0 > JVM_EXCLUDE_FILES += jvmtiGetLoadedClasses.cpp jvmtiThreadState.cpp jvmtiExtensions.cpp \ > jvmtiImpl.cpp jvmtiManageCapabilities.cpp jvmtiRawMonitor.cpp jvmtiUtil.cpp jvmtiTrace.cpp \ > jvmtiCodeBlobEvents.cpp jvmtiEnv.cpp jvmtiRedefineClasses.cpp jvmtiEnvBase.cpp jvmtiEnvThreadState.cpp \ > jvmtiTagMap.cpp jvmtiEventController.cpp evmCompat.cpp jvmtiEnter.xsl jvmtiExport.cpp \ > jvmtiClassFileReconstituter.cpp jvmtiTagMapTable.cpp > endif Hi Sergui, thanks for taking a look. I have updated with the names you suggested. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1157362083 From qamai at openjdk.org Tue Apr 4 14:57:18 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 4 Apr 2023 14:57:18 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v4] In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 16:36:08 GMT, Paul Sandoz wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - instruction asserts >> - Merge branch 'master' into sliceIntrinsics >> - add comments explaining anonymous classes >> - address reviews >> - sse2, increase warmup >> - aesthetic >> - optimise 64B >> - add jmh >> - vector slice intrinsics > > With the latest PR I am observing failures with debug builds for test compiler/vectorapi/TestVectorSlice.java on both AVX512 machines and aarch64 machines. > > On AVX512 machines the test fails with JVM args `-XX:UseAVX=3` and `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` and results in a test assertion failure e.g., > > Caused by: java.lang.RuntimeException: assertEquals: expected 70 to equal 0 > at jdk.test.lib.Asserts.fail(Asserts.java:594) > at jdk.test.lib.Asserts.assertEquals(Asserts.java:205) > at jdk.test.lib.Asserts.assertEquals(Asserts.java:189) > at compiler.vectorapi.TestVectorSlice.lambda$testInts$2(TestVectorSlice.java:163) > at compiler.vectorapi.TestVectorSlice.testInts(TestVectorSlice.java:181) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > ... 7 more > > > CPU flags are: > > fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant tsc arch perfmon rep good nopl xtopology cpuid tsc known freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4 1 sse4 2 x2apic movbe popcnt tsc deadline timer aes xsave avx f16c rdrand hypervisor lahf lm abm 3dnowprefetch cpuid fault invpcid single ssbd ibrs ibpb stibp ibrs enhanced tpr shadow vnmi flexpriority ept vpid ept ad fsgsbase tsc adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt good wbnoinvd arat avx512vbmi umip pku ospke avx512 vbmi2 gfni vaes vpclmulqdq avx512 vnni avx512 bitalg avx512 vpopcntdq la57 rdpid md clear arch capabilities > > > On aarch64 there is an IR rule failure. @PaulSandoz I have fixed the error in AVX512 and added feature predicates to not do IR check on AArch64 @XiaohongGong Thanks for your reviews, I have addressed them ------------- PR Comment: https://git.openjdk.org/jdk/pull/12909#issuecomment-1496115432 From dcubed at openjdk.org Tue Apr 4 16:06:06 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 4 Apr 2023 16:06:06 GMT Subject: RFR: 8305509: C1 fails "assert(k != nullptr) failed: illegal use of unloaded klass" In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:32:42 GMT, Coleen Phillimore wrote: > I tested the fix for JDK-8304743 on tier1-8, then did this refactoring which required a ciKlass::get_Klass() call that failed. This fix reverts the refactoring. > > Here was the commit for JDK-8304743: https://github.com/openjdk/jdk/commit/b062b1bd8126610d9288dc179d69e54a40b81015 > > I added the HandleMark to the utility function. It wasn't there in the original code. > > Tested with tier4. I agree that this this patch is a clean reversion of the refactoring changes made in the https://github.com/openjdk/jdk/pull/13270 PR. So this work returns the state of this patch to where you did Mach5 Tier[1-8] testing. You didn't declare this fix to be trivial, but I consider this to be a trivial [BACKOUT] of a portion of the fix for: [JDK-8304743](https://bugs.openjdk.org/browse/JDK-8304743) Compile_lock and SystemDictionary updates ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13327#pullrequestreview-1371317979 From coleenp at openjdk.org Tue Apr 4 16:10:13 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Apr 2023 16:10:13 GMT Subject: RFR: 8305509: C1 fails "assert(k != nullptr) failed: illegal use of unloaded klass" In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 16:03:11 GMT, Daniel D. Daugherty wrote: >> I tested the fix for JDK-8304743 on tier1-8, then did this refactoring which required a ciKlass::get_Klass() call that failed. This fix reverts the refactoring. >> >> Here was the commit for JDK-8304743: https://github.com/openjdk/jdk/commit/b062b1bd8126610d9288dc179d69e54a40b81015 >> >> I added the HandleMark to the utility function. It wasn't there in the original code. >> >> Tested with tier4. > > I agree that this this patch is a clean reversion of the refactoring > changes made in the https://github.com/openjdk/jdk/pull/13270 PR. > So this work returns the state of this patch to where you did Mach5 > Tier[1-8] testing. > > You didn't declare this fix to be trivial, but I consider this to be a > trivial [BACKOUT] of a portion of the fix for: > > [JDK-8304743](https://bugs.openjdk.org/browse/JDK-8304743) Compile_lock and SystemDictionary updates @dcubed-ojdk Thank you Dan for reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13327#issuecomment-1496240382 From coleenp at openjdk.org Tue Apr 4 16:35:17 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Apr 2023 16:35:17 GMT Subject: RFR: 8305509: C1 fails "assert(k != nullptr) failed: illegal use of unloaded klass" In-Reply-To: References: Message-ID: <3NFW0gsHYfC6CCZlM207LhwpsAM7c-ls3_7KoS6WZmo=.9b1bf560-eb64-445a-b1e2-90ae19b96676@github.com> On Tue, 4 Apr 2023 14:32:42 GMT, Coleen Phillimore wrote: > I tested the fix for JDK-8304743 on tier1-8, then did this refactoring which required a ciKlass::get_Klass() call that failed. This fix reverts the refactoring. > > Here was the commit for JDK-8304743: https://github.com/openjdk/jdk/commit/b062b1bd8126610d9288dc179d69e54a40b81015 > > I added the HandleMark to the utility function. It wasn't there in the original code. > > Tested with tier4. Thank you for specifying trivial as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13327#issuecomment-1496271476 From coleenp at openjdk.org Tue Apr 4 16:35:18 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Apr 2023 16:35:18 GMT Subject: Integrated: 8305509: C1 fails "assert(k != nullptr) failed: illegal use of unloaded klass" In-Reply-To: References: Message-ID: <9FndyjBkMaBmq2B6-8o6f8Q_3C8AMCI5-R8xPHak8FM=.b2a0f7b3-e582-4859-8806-6b2ee43289ea@github.com> On Tue, 4 Apr 2023 14:32:42 GMT, Coleen Phillimore wrote: > I tested the fix for JDK-8304743 on tier1-8, then did this refactoring which required a ciKlass::get_Klass() call that failed. This fix reverts the refactoring. > > Here was the commit for JDK-8304743: https://github.com/openjdk/jdk/commit/b062b1bd8126610d9288dc179d69e54a40b81015 > > I added the HandleMark to the utility function. It wasn't there in the original code. > > Tested with tier4. This pull request has now been integrated. Changeset: 2ee42451 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/2ee42451057455fdfe7c102d7a341136999e16ef Stats: 44 lines in 4 files changed: 15 ins; 24 del; 5 mod 8305509: C1 fails "assert(k != nullptr) failed: illegal use of unloaded klass" Reviewed-by: dcubed ------------- PR: https://git.openjdk.org/jdk/pull/13327 From rkennke at openjdk.org Tue Apr 4 19:03:49 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 4 Apr 2023 19:03:49 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> Message-ID: On Tue, 4 Apr 2023 06:46:02 GMT, David Holmes wrote: > One thing I can't quite get clear in my head is whether the small window where an object's monitor is inflated and the object is still in the thread's lock-stack, could cause an issue for any external observers trying to determine the object's locked state. Most observers are thread-local and are basically asking 'am I locking this object?'. Almost all cases where an external thread is checking lightweight- and monitor locks are doing so from a safepoint. The only exception seems to be the path through management.cpp that you are mentioning below, and you say we're ok there (and I tend to agree). > I'm still unclear how we can have non-JavaThreads here. Only JavaThreads can lock object monitors. I've checked again. This particular case is only reached through verification and can easily be solved. The other instance of similar check in synchronizer.cpp seems no longer be called from non-Java-thread, it's probably been fixed by some other change earlier in this PR. In any case, I changed these code paths to assume Java thread and it's not failing tier1 and tier2, but may be worth to do more extensive testing. > src/hotspot/share/runtime/synchronizer.cpp line 1314: > >> 1312: LogStreamHandle(Trace, monitorinflation) lsh; >> 1313: if (mark.is_fast_locked()) { >> 1314: assert(LockingMode == 2, "can only happen with new lightweight locking"); > > I'd rather see this entire case guarded by "`if (LockingMode ==2)`" and have `is_fast_locked()` assert if called in any other mode. Similarly for the stack-locked case below. Ok, I am doing this change. But it means many more LockingMode == 1 or 2 checks in other places, and also means that we need to do a raw-check under VerifyHeavyMonitor paths that the mark-word is not stack-/fast-locked. Overall it does look cleaner and gives a better idea which code path deals with what kind of locking. > Not at all clear to me how this fits here. ?? This block checks whether the monitor is in waiting state. When it is anonymously locked it must be waiting. I added a comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1496449012 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157647952 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157651107 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157652282 From rkennke at openjdk.org Tue Apr 4 19:03:50 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 4 Apr 2023 19:03:50 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> <2f4eLmsH_dQMV6eQfr1PR0nOIjWWMqe2LDzIk48kBHw=.7bd76dee-a18a-426f-8ea9-cd17e0717352@github.com> Message-ID: On Tue, 4 Apr 2023 12:57:38 GMT, Thomas Stuefe wrote: >> Thanks @tstuefe . I see at that level if the object doesn't match the top of the lock-stack then we take the slow path. But then I'm lost - AFAICS the slow path is `InterpreterRuntime::monitorexit` and that doesn't have any fast-locking code in it at all ??? > > I'm not sure what you mean. `InterpreterRuntime::monitorexit` will enter `ObjectSynchronizer::exit` which handles the fast-locking case under `if (LockingMode == 2)...`. Or am I misunderstanding you? > > (I really wish for named constants instead of `1` and `2` constants though...) > Is this code used for monitorexit or only returning from synchronized methods? If used for monitorexit there is no requirement that the monitor being unlocked was the last monitor locked. Balanced locking only requires the locks and unlocks are matched, not that they are perfectly nested ie. this is valid in bytecode: > > ``` > monitorenter A > monitorenter B > monitorexit A > monitorExit B > ``` As Thomas mentioned, we check this in the interpreter paths, which are the only paths where this can happen. C1 and C2 would reject bytecode with unstructured locking. That's why we can assert for structured locking in MacroAssembler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157649461 From rkennke at openjdk.org Tue Apr 4 19:09:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 4 Apr 2023 19:09:33 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> Message-ID: On Tue, 4 Apr 2023 19:01:44 GMT, Roman Kennke wrote: >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/JavaVFrame.java line 85: >> >>> 83: ( // we have marked ourself as pending on this monitor >>> 84: mark.monitor().equals(thread.getCurrentPendingMonitor()) || >>> 85: mark.monitor().isOwnedAnonymous() || >> >> Not at all clear to me how this fits here. ?? > >> Not at all clear to me how this fits here. ?? > > This block checks whether the monitor is in waiting state. When it is anonymously locked it must be waiting. I added a comment. > Given the owner could release the monitor the moment after we check I don't see how false results are an issue here. The existing code should be safe when not executed at a safepoint.. I checked again. It looks like the DeadLock test now passes even if I let the code in management.cpp go check stacks without safepoint. I believe the addition of the start_processing() to LockStack::contains() fixes the ZGC problem. But please, run the full tests again on Mach5. I don't see any failures here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157654320 From rkennke at openjdk.org Tue Apr 4 19:09:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 4 Apr 2023 19:09:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v53] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Address David's review comments - Allow scanning lock-stack outside safepoint ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/13c84b5c..839f350b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=52 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=51-52 Stats: 47 lines in 10 files changed: 5 ins; 22 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Tue Apr 4 20:40:03 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 4 Apr 2023 20:40:03 GMT Subject: RFR: 8303563: GetCurrentThreadCpuTime and GetThreadCpuTime need further clarification for virtual threads Message-ID: This is a follow-up to [JDK-8302615](https://bugs.openjdk.org/browse/JDK-8302615) where GetCurrentThreadCpuTime and GetThreadCpuTime were changed from being not supported to optional, when called from/with a virtual thread. There are two additional sentences that need adjustment to avoid creating a conflict in the spec. In the functions `GetCurrentThreadCpuTime` and `GetThreadCpuTime`: The fragment: `"The current thread may not be a virtual thread. Otherwise, the error code"` is replaced with: "An implementation is not required to support this function when the current thread is a virtual thread, in which case" CSR: [JDK-8305617](https://bugs.openjdk.org/browse/JDK-8305617) ------------- Commit messages: - 8303563: GetCurrentThreadCpuTime and GetThreadCpuTime need further clarification for virtual threads Changes: https://git.openjdk.org/jdk/pull/13338/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13338&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303563 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13338.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13338/head:pull/13338 PR: https://git.openjdk.org/jdk/pull/13338 From lkorinth at openjdk.org Tue Apr 4 20:51:13 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 4 Apr 2023 20:51:13 GMT Subject: RFR: 8305618: Move gcold out of tier1 Message-ID: 8305618: Move gcold out of tier1 Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. ------------- Commit messages: - removed gcold from tier1_gc Changes: https://git.openjdk.org/jdk/pull/13340/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13340&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305618 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13340.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13340/head:pull/13340 PR: https://git.openjdk.org/jdk/pull/13340 From iklam at openjdk.org Tue Apr 4 21:15:00 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 4 Apr 2023 21:15:00 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block Message-ID: This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. **Notes for reviewers:** - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). - It might be easier to see the diff with whitespaces off. - There are two major changes in the G1 code - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) - Testing changes: - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. **Testing:** - Mach5 tiers 1 ~ 7 ------------- Commit messages: - Remove archive region types from G1 - clean up (1) - 8298048: Combine CDS archive heap into a single block Changes: https://git.openjdk.org/jdk/pull/13284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8298048 Stats: 2995 lines in 75 files changed: 110 ins; 2271 del; 614 mod Patch: https://git.openjdk.org/jdk/pull/13284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13284/head:pull/13284 PR: https://git.openjdk.org/jdk/pull/13284 From lmesnik at openjdk.org Tue Apr 4 21:31:04 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 4 Apr 2023 21:31:04 GMT Subject: RFR: 8305618: Move gcold out of tier1 In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 20:42:35 GMT, Leo Korinth wrote: > 8305618: Move gcold out of tier1 > > Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13340#pullrequestreview-1371817232 From dholmes at openjdk.org Tue Apr 4 22:07:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Apr 2023 22:07:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> <2f4eLmsH_dQMV6eQfr1PR0nOIjWWMqe2LDzIk48kBHw=.7bd76dee-a18a-426f-8ea9-cd17e0717352@github.com> Message-ID: On Tue, 4 Apr 2023 12:57:38 GMT, Thomas Stuefe wrote: >> Thanks @tstuefe . I see at that level if the object doesn't match the top of the lock-stack then we take the slow path. But then I'm lost - AFAICS the slow path is `InterpreterRuntime::monitorexit` and that doesn't have any fast-locking code in it at all ??? > > I'm not sure what you mean. `InterpreterRuntime::monitorexit` will enter `ObjectSynchronizer::exit` which handles the fast-locking case under `if (LockingMode == 2)...`. Or am I misunderstanding you? > > (I really wish for named constants instead of `1` and `2` constants though...) Thanks @tstuefe .I misread something. > (I really wish for named constants instead of 1 and 2 constants though...) Yeah but then we are back at the "what do we call this" problem :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157813697 From dholmes at openjdk.org Tue Apr 4 22:15:41 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Apr 2023 22:15:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> Message-ID: <3Xk6QvHPcMBoY8Su5aEHQBkkDjPtRKUtkm7Avnq-K_k=.6f85a673-69a2-48d1-90b2-0994f346fd00@github.com> On Tue, 4 Apr 2023 19:04:03 GMT, Roman Kennke wrote: >>> Not at all clear to me how this fits here. ?? >> >> This block checks whether the monitor is in waiting state. When it is anonymously locked it must be waiting. I added a comment. > >> Given the owner could release the monitor the moment after we check I don't see how false results are an issue here. The existing code should be safe when not executed at a safepoint.. > > I checked again. It looks like the DeadLock test now passes even if I let the code in management.cpp go check stacks without safepoint. I believe the addition of the start_processing() to LockStack::contains() fixes the ZGC problem. But please, run the full tests again on Mach5. I don't see any failures here. > When it is anonymously locked it must be waiting. I guess I am unclear what "waiting" refers to here, and which "thread" we are checking for what. If the monitor is anonymously locked then we know it is contended - perhaps that is what this "waiting" means? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157819177 From psandoz at openjdk.org Tue Apr 4 23:57:06 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 4 Apr 2023 23:57:06 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 13:46:12 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style Tier 1-3 tests now pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12909#issuecomment-1496737872 From dcubed at openjdk.org Wed Apr 5 00:17:44 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 5 Apr 2023 00:17:44 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v53] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 19:09:31 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Address David's review comments > - Allow scanning lock-stack outside safepoint I've updated my repos with v52 and I'm doing Mach5 testing on it. I've still only reviewed up thru v36 so I have some catching up to do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1496755688 From dholmes at openjdk.org Wed Apr 5 00:17:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Apr 2023 00:17:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v52] In-Reply-To: <3Xk6QvHPcMBoY8Su5aEHQBkkDjPtRKUtkm7Avnq-K_k=.6f85a673-69a2-48d1-90b2-0994f346fd00@github.com> References: <7jOmtn6ogSPGZYMvJmV7yPWUgR5sBImxQaf2F8vQZBc=.6f4cad6d-058b-48a5-a943-ff2f18f90d3a@github.com> <3Xk6QvHPcMBoY8Su5aEHQBkkDjPtRKUtkm7Avnq-K_k=.6f85a673-69a2-48d1-90b2-0994f346fd00@github.com> Message-ID: On Tue, 4 Apr 2023 22:12:03 GMT, David Holmes wrote: >>> Given the owner could release the monitor the moment after we check I don't see how false results are an issue here. The existing code should be safe when not executed at a safepoint.. >> >> I checked again. It looks like the DeadLock test now passes even if I let the code in management.cpp go check stacks without safepoint. I believe the addition of the start_processing() to LockStack::contains() fixes the ZGC problem. But please, run the full tests again on Mach5. I don't see any failures here. > >> When it is anonymously locked it must be waiting. > > I guess I am unclear what "waiting" refers to here, and which "thread" we are checking for what. If the monitor is anonymously locked then we know it is contended - perhaps that is what this "waiting" means? > The existing code should be safe when not executed at a safepoint. Just to be clear I meant the code before your changes should be safe. Your code needs to establish it is safe - which takes us back to the issue of querying the lock-stack while it may be being concurrently pushed/popped. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1157820627 From vitaly.provodin at jetbrains.com Wed Apr 5 01:24:02 2023 From: vitaly.provodin at jetbrains.com (Vitaly Provodin) Date: Wed, 5 Apr 2023 08:24:02 +0700 Subject: crash logs without stack trace In-Reply-To: References: <14C16303-1377-4968-93C8-B5347ACF806F@jetbrains.com> Message-ID: <5587FCF1-42D9-4CC3-AAA0-F7BDE33E0031@jetbrains.com> Thanks for the replies Unfortunately we cannot 100% reproduce these crashes, but they happen from time to time. Most probably debugging core dumps is only way to understand what happened. We will notify about any issues (if any) or report them into JBS. From dholmes at openjdk.org Wed Apr 5 01:51:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Apr 2023 01:51:14 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames Renamings look good to me. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12923#pullrequestreview-1372019560 From dholmes at openjdk.org Wed Apr 5 02:15:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Apr 2023 02:15:15 GMT Subject: RFR: 8303563: GetCurrentThreadCpuTime and GetThreadCpuTime need further clarification for virtual threads In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 20:33:46 GMT, Serguei Spitsyn wrote: > This is a follow-up to [JDK-8302615](https://bugs.openjdk.org/browse/JDK-8302615) where GetCurrentThreadCpuTime and GetThreadCpuTime were changed from being not supported to optional, when called from/with a virtual thread. There are two additional sentences that need adjustment to avoid creating a conflict in the spec. > > In the functions `GetCurrentThreadCpuTime` and `GetThreadCpuTime`: > > The fragment: > `"The current thread may not be a virtual thread. Otherwise, the error code"` > > is replaced with: > > "An implementation is not required to support this function > when the current thread is a virtual thread, in which case" > > > CSR: [JDK-8305617](https://bugs.openjdk.org/browse/JDK-8305617) Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13338#pullrequestreview-1372034930 From dholmes at openjdk.org Wed Apr 5 02:36:12 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Apr 2023 02:36:12 GMT Subject: RFR: 8305509: C1 fails "assert(k != nullptr) failed: illegal use of unloaded klass" In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:32:42 GMT, Coleen Phillimore wrote: > I tested the fix for JDK-8304743 on tier1-8, then did this refactoring which required a ciKlass::get_Klass() call that failed. This fix reverts the refactoring. > > Here was the commit for JDK-8304743: https://github.com/openjdk/jdk/commit/b062b1bd8126610d9288dc179d69e54a40b81015 > > I added the HandleMark to the utility function. It wasn't there in the original code. > > Tested with tier4. I'll add an after-the-fact review. I needed to understand where the code review had failed and see now how the original code uses a `ciKlass` not a `klass`. Backout looks good. Thanks. ------------- PR Review: https://git.openjdk.org/jdk/pull/13327#pullrequestreview-1372050576 From dholmes at openjdk.org Wed Apr 5 02:43:03 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Apr 2023 02:43:03 GMT Subject: RFR: 8305618: Move gcold out of tier1 In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 20:42:35 GMT, Leo Korinth wrote: > 8305618: Move gcold out of tier1 > > Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. You've removed tier1_gc_gcold from tier1_gc as stated, but shouldn't that group now be renamed as it is no longer part of tier1? And does it need to be added into another tier? ------------- PR Review: https://git.openjdk.org/jdk/pull/13340#pullrequestreview-1372060499 From alanb at openjdk.org Wed Apr 5 06:09:05 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 5 Apr 2023 06:09:05 GMT Subject: RFR: 8303563: GetCurrentThreadCpuTime and GetThreadCpuTime need further clarification for virtual threads In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 20:33:46 GMT, Serguei Spitsyn wrote: > This is a follow-up to [JDK-8302615](https://bugs.openjdk.org/browse/JDK-8302615) where GetCurrentThreadCpuTime and GetThreadCpuTime were changed from being not supported to optional, when called from/with a virtual thread. There are two additional sentences that need adjustment to avoid creating a conflict in the spec. > > In the functions `GetCurrentThreadCpuTime` and `GetThreadCpuTime`: > > The fragment: > `"The current thread may not be a virtual thread. Otherwise, the error code"` > > is replaced with: > > "An implementation is not required to support this function > when the current thread is a virtual thread, in which case" > > > CSR: [JDK-8305617](https://bugs.openjdk.org/browse/JDK-8305617) Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13338#pullrequestreview-1372207726 From sspitsyn at openjdk.org Wed Apr 5 06:48:13 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Apr 2023 06:48:13 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v15] In-Reply-To: <96JFUTmjTmM45mitEipdK8x2jqjo2uhqF42031X6LRQ=.11d4b13f-7d95-426a-8fc9-50e7dfb8f8ab@github.com> References: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> <96JFUTmjTmM45mitEipdK8x2jqjo2uhqF42031X6LRQ=.11d4b13f-7d95-426a-8fc9-50e7dfb8f8ab@github.com> Message-ID: <_qDvAFz7h-1lyB_hYlBCN6Za8wQlvJZriTDHVIYXmB4=.9ce998bf-5d9c-449e-9ec7-937fcd224f97@github.com> On Tue, 4 Apr 2023 14:41:13 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/prims/agent.hpp line 1: >> >>> 1: /* >> >> The name for class and file is too general. >> I'm thinking if renaming the files to jvmtiAgent and the class to JvmtiAgent would work. >> In general, there exists a convention to name JVMTI file with the "jvmti" prefix. >> It is a gray zone between Runtime and JVMTI but seems to belong more to JVMTI. >> The same about the AgentList class and file. >> Also, these new files are good candidates to add here: >> >> make/hotspot/lib/JvmFeatures.gmk: >> ifneq ($(call check-jvm-feature, jvmti), true) >> JVM_CFLAGS_FEATURES += -DINCLUDE_JVMTI=0 >> JVM_EXCLUDE_FILES += jvmtiGetLoadedClasses.cpp jvmtiThreadState.cpp jvmtiExtensions.cpp \ >> jvmtiImpl.cpp jvmtiManageCapabilities.cpp jvmtiRawMonitor.cpp jvmtiUtil.cpp jvmtiTrace.cpp \ >> jvmtiCodeBlobEvents.cpp jvmtiEnv.cpp jvmtiRedefineClasses.cpp jvmtiEnvBase.cpp jvmtiEnvThreadState.cpp \ >> jvmtiTagMap.cpp jvmtiEventController.cpp evmCompat.cpp jvmtiEnter.xsl jvmtiExport.cpp \ >> jvmtiClassFileReconstituter.cpp jvmtiTagMapTable.cpp >> endif > > Hi Serguei, thanks for taking a look. > > I have updated with the names you suggested. Thanks. Thank you for the update. It looks good to me. I still need to finish my review. Sorry for the latency. It is is a busy time now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1158087046 From sspitsyn at openjdk.org Wed Apr 5 06:59:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Apr 2023 06:59:18 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v15] In-Reply-To: References: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> Message-ID: On Mon, 3 Apr 2023 12:59:12 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/prims/agentList.cpp line 204: >> >>> 202: >>> 203: // Invokes Agent_OnAttach for agents loaded dynamically during runtime. >>> 204: jint AgentList::load_agent(const char* agent_name, const char* absParam, >> >> I feel that it is better to keep the original function name "load_agent_library". As you listed there two kinds of agents: Java and Native. The function name give a hint it is native agent. Also, it is better to avoid changes that aren't really necessary. > > I changed the names because I found it very hard to understand what the old names represented: "AgentLibrary" vs "Library"? "add_init_agent" vs "add_instrumentation_agent", or even "add_loaded_agent"? Also a bit confusing that "load_agent_library" would also include statically linked agents - no library is loaded there. Okay. Refactoring is usually not easy to review. With a renaming it becomes harder, so it is better to be conservative. There are other side effects to consider: - back porting also becomes harder - developers have to learn new names instead of already known The good side is that your refactoring consolidates this code in a well known locations. :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1158094815 From sspitsyn at openjdk.org Wed Apr 5 07:52:07 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Apr 2023 07:52:07 GMT Subject: RFR: 8303563: GetCurrentThreadCpuTime and GetThreadCpuTime need further clarification for virtual threads In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 20:33:46 GMT, Serguei Spitsyn wrote: > This is a follow-up to [JDK-8302615](https://bugs.openjdk.org/browse/JDK-8302615) where GetCurrentThreadCpuTime and GetThreadCpuTime were changed from being not supported to optional, when called from/with a virtual thread. There are two additional sentences that need adjustment to avoid creating a conflict in the spec. > > In the functions `GetCurrentThreadCpuTime` and `GetThreadCpuTime`: > > The fragment: > `"The current thread may not be a virtual thread. Otherwise, the error code"` > > is replaced with: > > "An implementation is not required to support this function > when the current thread is a virtual thread, in which case" > > > CSR: [JDK-8305617](https://bugs.openjdk.org/browse/JDK-8305617) Thank you for review, David and Alan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13338#issuecomment-1497065943 From aph at openjdk.org Wed Apr 5 08:22:15 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 5 Apr 2023 08:22:15 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 08:00:20 GMT, Hao Sun wrote: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of some exception scenarios (Recall the reason why we fail to use SP as the modifier). > > Finally, we choose to use value zero as the modifier. Trivially, it's compatible with virtual threads. However, compared to FP modifier, this solution would reduce the strength of PAC-RET protection to some extent. E.g., you get the same authentication code for each call to the function, whereas using FP gives you different codes as long as the stack depth is different. > > ### Implementation of Zero modifier > > Here list the key updates of this patch. > > 1. vm_version_aarch64.cpp > > Remove the constraint on "enable-preview" and "PreserveFramePointer". > > 2. macroAssembler_aarch64.cpp > > For utility protect_return_address(), 1) use PACIAZ/PACIZA instructions directly. 2) argument "temp_reg" is removed since all functions use the same modifier. 3) all the use sites are updated accordingly. This involves the updates in many files. > > Similar updates are done to utility authenticate_return_address(). > > Besides, aarch64.ad and AArch64TestAssembler.java are updated accordingly. > > 3. pauth_linux_aarch64.inline.hpp > > For utilities pauth_sign_return_address() and > pauth_authenticate_return_address(), remove the second argument and pass value zero to r16 register. > > Similarly, all the use sites are updated as well. This involves the updates in many files. > > 6. continuationHelper_aarch64.inline.hpp > > Introduce return_pc_at() and patch_pc_at() to avoid directly reading the saved PC or writing new signed PC on the stack in shared code. > > 7. Minor updates > > (1) sharedRuntime_aarch64.cpp: Add the missing > authenticate_return_address() use for function gen_continuation_enter(). In functions generate_deopt_blob() and generate_uncommon_trap_blob(), remove the authentication on the caller (3) frame since the return address is not used. > > (2) stubGenerator_aarch64.cpp: Add the missing > authenticate_return_address() use for function generate_cont_thaw(). > > (3) runtime.cpp: enable the authentication. > > ### Test > > 1. Cross compilations on arm32/s390/ppc/riscv passed. > 2. zero build and x86 build passed. > 3. tier1~3 passed on Linux/AArch64 w/ and w/o PAC-RET. > > Co-Developed-by: Nick Gasson > > [1] https://bugs.openjdk.org/browse/JDK-8277204 > [2] https://openjdk.org/jeps/425 > [3] https://github.com/openjdk/jdk/pull/9067 > [4] https://bugs.openjdk.org/browse/JDK-8288023 > [5] https://bugs.openjdk.org/browse/JDK-8301819 > [6] https://openjdk.org/jeps/444 > [7] https://www.usenix.org/conference/usenixsecurity21/presentation/liljestrand > [8] https://github.com/openjdk/jdk/pull/10441 This is going to take time to review. It's very intrusive, and PAC/RET places a burden on maintainers because it will double the testing effort whenever HotSpot is changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1497103341 From shade at openjdk.org Wed Apr 5 08:35:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Apr 2023 08:35:15 GMT Subject: RFR: 8305618: Move gcold out of tier1 In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 02:40:43 GMT, David Holmes wrote: > You've removed tier1_gc_gcold from tier1_gc as stated, but shouldn't that group now be renamed as it is no longer part of tier1? And does it need to be added into another tier? +1. Shenandoah runs both gcold and gcbasher in tier3, should we run these in tier3 as well? Otherwise they would get caught by tier4, which is too long for many to run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13340#issuecomment-1497119093 From jwaters at openjdk.org Wed Apr 5 09:39:19 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 5 Apr 2023 09:39:19 GMT Subject: RFR: 8305341: Alignment should be enforced by alignas instead of compiler specific attributes [v2] In-Reply-To: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> References: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> Message-ID: > C11 has been stable for a long time on all platforms, so native code can use the standard alignas operator for alignment requirements Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - gcc offset_of - Remove Visual C++ alignment - Remove gcc alignment - globalDefinitions.hpp - Merge branch 'openjdk:master' into patch-6 - - GSSLibStub.c - ArrayReferenceImpl.c - Alignment outside of HotSpot should be enforced by alignas instead of compiler specific attributes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13258/files - new: https://git.openjdk.org/jdk/pull/13258/files/7e5e6449..7dc7f7d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13258&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13258&range=00-01 Stats: 16524 lines in 394 files changed: 7231 ins; 8051 del; 1242 mod Patch: https://git.openjdk.org/jdk/pull/13258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13258/head:pull/13258 PR: https://git.openjdk.org/jdk/pull/13258 From jwaters at openjdk.org Wed Apr 5 09:39:22 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 5 Apr 2023 09:39:22 GMT Subject: RFR: 8305341: Alignment should be enforced by alignas instead of compiler specific attributes In-Reply-To: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> References: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> Message-ID: <648nGU6epXFxCy9rfbKldQsb_LN7wqrjkzQMN9n3z98=.4d090b19-8f13-4740-a595-17b0564d3a54@github.com> On Fri, 31 Mar 2023 06:07:39 GMT, Julian Waters wrote: > C11 has been stable for a long time on all platforms, so native code can use the standard alignas operator for alignment requirements Hmm, right I'll link that issue into this one as well then ------------- PR Comment: https://git.openjdk.org/jdk/pull/13258#issuecomment-1497188210 From mgronlun at openjdk.org Wed Apr 5 09:48:21 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 5 Apr 2023 09:48:21 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 01:48:19 GMT, David Holmes wrote: > Renamings look good to me. Thank you for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1497209787 From mgronlun at openjdk.org Wed Apr 5 09:48:23 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 5 Apr 2023 09:48:23 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v15] In-Reply-To: References: <5cFyTNQZjfRp6VlzOqkgdwhoSGaX92KNL3EZlv-NrpY=.fae84f7f-f1d4-4354-a123-33ab97928dcf@github.com> Message-ID: On Wed, 5 Apr 2023 06:55:16 GMT, Serguei Spitsyn wrote: >> I changed the names because I found it very hard to understand what the old names represented: "AgentLibrary" vs "Library"? "add_init_agent" vs "add_instrumentation_agent", or even "add_loaded_agent"? Also a bit confusing that "load_agent_library" would also include statically linked agents - no library is loaded there. > > Okay. > Refactoring is usually not easy to review. > With a renaming it becomes harder, so it is better to be conservative. > > There are other side effects to consider: > > - back porting also becomes harder > - developers have to learn new names instead of already known > > The good side is that your refactoring consolidates this code in a well known locations. :-) Of course, I would not have changed this unless I believe it improves things. The abstraction is now better from the perspective of the rest of the VM. There are now only JVMTI agents, and they are kept in a list. Arguments.cpp adds agents to the list. The same thing for the diagnosticCommand.cpp for dynamically loaded agents. Threads.cpp loads the JVMTI agents, java.cpp unloads agents. All other sites take out an iterator of the subtypes they want to iterate. There is no longer any separation between "agent" and "library"; the subtypes of the agents are now abstracted away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1158282424 From rkennke at openjdk.org Wed Apr 5 10:40:54 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 5 Apr 2023 10:40:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v54] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Named constants for LockingMode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/839f350b..baf71624 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=53 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=52-53 Stats: 157 lines in 37 files changed: 9 ins; 0 del; 148 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From lkorinth at openjdk.org Wed Apr 5 13:09:13 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 5 Apr 2023 13:09:13 GMT Subject: RFR: 8305618: Move gcold out of tier1 In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 20:42:35 GMT, Leo Korinth wrote: > 8305618: Move gcold out of tier1 > > Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. I can add tier1_gc_gcold to tier2 or tier3 (and rename it to correct tier number), or I could just remove it and it will run in tier4. What do you prefer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13340#issuecomment-1497456222 From pminborg at openjdk.org Wed Apr 5 14:07:47 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 5 Apr 2023 14:07:47 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v17] In-Reply-To: References: Message-ID: <9PKnuMc3_0dEdBMHR6hectXHSSYnUVCC5i0jdemzVjA=.998f626e-e2d9-4b03-9ccb-1b93c1082b27@github.com> > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Update JEP number and name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/928ad35e..3a4d9f66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From jcking at openjdk.org Wed Apr 5 14:07:11 2023 From: jcking at openjdk.org (Justin King) Date: Wed, 5 Apr 2023 14:07:11 GMT Subject: RFR: JDK-8305320: DbgStrings and AsmRemarks are leaking In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 18:30:41 GMT, Justin King wrote: > Fix two leaks related to `DbgStrings` and `AsmRemarks` in debug builds. Friendly poke. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13249#issuecomment-1497547535 From pminborg at openjdk.org Wed Apr 5 14:20:46 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 5 Apr 2023 14:20:46 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v18] In-Reply-To: References: Message-ID: <_PWutk51h6HTKm9BEzpkSLUxUXjDmg3YHQM-ojIQkZI=.ed9ed673-e70e-4f45-9496-e16408796427@github.com> > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Improve code snipet ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/3a4d9f66..183d3511 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=16-17 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From pminborg at openjdk.org Wed Apr 5 14:48:29 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 5 Apr 2023 14:48:29 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v19] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: - Merge master - Improve code snipet - Update JEP number and name - Update src/java.base/share/classes/java/lang/foreign/MemorySegment.java Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> - Update src/java.base/share/classes/java/lang/foreign/MemorySegment.java Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> - Cleanup finality - Merge pull request #1 from JornVernee/Fix_ULE Fix ULE when intializing LibFallback - fix ULE when intializing LibFallback - Remove unused method and declare class final - Fix copyrigth year issues - ... and 23 more: https://git.openjdk.org/jdk/compare/2e59d21e...0ee65ac1 ------------- Changes: https://git.openjdk.org/jdk/pull/13079/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=18 Stats: 13133 lines in 268 files changed: 5016 ins; 6013 del; 2104 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From lmesnik at openjdk.org Wed Apr 5 15:03:07 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 5 Apr 2023 15:03:07 GMT Subject: RFR: 8305618: Move gcold out of tier1 In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 08:32:28 GMT, Aleksey Shipilev wrote: > You've removed tier1_gc_gcold from tier1_gc as stated, but shouldn't that group now be renamed as it is no longer part of tier1? And does it need to be added into another tier? There is some misunderstanding in the gc tiers/groups naming and usage. We use only tier1_gc and hotspot_gc groups in our testing and don't use hotspot tier2/tier3 test groups directly. Thus, the gc_old tests would be executed with all other tests from hotspot_gc. And we just not going to use this test group. I am fine with renaming it and adding to the tier3 if is more convenient for openjdk users. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13340#issuecomment-1497638188 From cslucas at openjdk.org Wed Apr 5 15:52:33 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Apr 2023 15:52:33 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v4] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <6NDwZSpjSrokmglncPRp4tM7_Hiq4b26dXukhXODpKo=.8ba7efd0-bc44-4f1e-beb8-c1c68bc33515@github.com> Message-ID: On Fri, 24 Mar 2023 19:02:57 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Add support for SR'ing some inputs of merges used for field loads > > src/hotspot/share/code/debugInfo.hpp line 199: > >> 197: // ObjectValue describing an object that was scalar replaced. >> 198: >> 199: class ObjectMergeValue: public ScopeValue { > > Why you did not make subclass of ObjectValue? You would need to check `sv->is_object_merge()` first before `sv->is_object()` in few places. But on other hand you don't need to duplicates ObjectValue`s fields and asserts. Hi @vnkozlov, just FYI. I made the changes that you suggested. Please let me know what you think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1158700435 From cslucas at openjdk.org Wed Apr 5 15:52:29 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Apr 2023 15:52:29 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v6] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges that are used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straight forward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also tested with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/5ef86371..3752b21a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=04-05 Stats: 346 lines in 3 files changed: 113 ins; 106 del; 127 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From cslucas at openjdk.org Wed Apr 5 15:52:34 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Apr 2023 15:52:34 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v4] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <6NDwZSpjSrokmglncPRp4tM7_Hiq4b26dXukhXODpKo=.8ba7efd0-bc44-4f1e-beb8-c1c68bc33515@github.com> <7xRwVRVapKbqiVQMDMZUh3ILhfaYub_brXWVopFhJ8M=.28289c04-0ff0-4f19-b764-03af4d3155d6@github.com> Message-ID: On Sat, 25 Mar 2023 00:07:20 GMT, Vladimir Kozlov wrote: >> I had considered that but decided not to do it to prevent adding a new IR node. I'll give that a shot and update this thread with how it goes. > > It **will** complicate your DebugInfo code (packing/unpacking) information. But I think it is right thing to do to avoid duplicated re-allocations during deoptimization - you should have only one new object. Hi @vnkozlov, just FYI. I made the changes that you suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1158699338 From cslucas at openjdk.org Wed Apr 5 15:52:37 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Apr 2023 15:52:37 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v5] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Sat, 1 Apr 2023 00:44:55 GMT, Xin Liu wrote: > Do you consider to perform the transformation in MacroExpand? Your prior changes have already removed NSR marks, ME/SR will consider 'ptn'. Yes, I actually did. However, that makes the changes much more complicated. I patched this method to reuse the scalar replacement method in MacroExpand so that we don't have code duplication. I hope that's sufficient as a first implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1157914272 From cslucas at openjdk.org Wed Apr 5 15:52:38 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Apr 2023 15:52:38 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v5] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 31 Mar 2023 18:24:45 GMT, Xin Liu wrote: > It looks like we can use (safepoints == nullptr) instead? Yeap. Thanks. I don't know how I missed that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1157909570 From cslucas at openjdk.org Wed Apr 5 15:52:39 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Apr 2023 15:52:39 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v5] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Wed, 5 Apr 2023 00:59:29 GMT, Cesar Soares Lucas wrote: >> Do you really need the boolean parameter ignore_merges here? >> It looks like we can use (safepoints == nullptr) instead? > >> It looks like we can use (safepoints == nullptr) instead? > > Yeap. Thanks. I don't know how I missed that. > With ignore_merges, why we also skip EncodeP or MemBarRelease here? The EncodeP shouldn't prevent the reduction of Phi because I check how the Phi is used. The MemBarRelease node shouldn't prevent the reduction because once the Allocate input to the Phi is set to SR the MemBarRelease node will be removed as part of Ideal transformations after EA. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1157910405 From cslucas at openjdk.org Wed Apr 5 15:52:40 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Apr 2023 15:52:40 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v4] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <6NDwZSpjSrokmglncPRp4tM7_Hiq4b26dXukhXODpKo=.8ba7efd0-bc44-4f1e-beb8-c1c68bc33515@github.com> <0UbMqMHtVIayPdJMmfDF6YTadWe4YTlSW6mZc5P3IU8=.c4b1a292-e434-4c57-a5cd-015edca2ec95@github.com> Message-ID: On Fri, 31 Mar 2023 18:38:43 GMT, Xin Liu wrote: >> I see, you use it in escape.cpp. Okay. I need to review changes there too. > > or you could construct a temporary PhaseMacroExpand object in EA. > > I see that you convert many member function to static so you can query in EA. the only blocker is _igvn. That seems a good idea. Together with some other refactoring I decided to revert making the methods static and instead use them through an instance of ME. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1158698606 From dcubed at openjdk.org Wed Apr 5 15:54:52 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 5 Apr 2023 15:54:52 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v54] In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 10:40:54 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Named constants for LockingMode v52 is failing quite a few JVM/TI tests with crashes that look like this: # Internal Error (/opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S40935/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b154597d-2ba7-420d-81c1-ef13f408c137/runs/d52e181d-f011-47c8-a35f-30fcbba5c164/workspace/open/src/hotspot/share/runtime/javaThread.hpp:983), pid=1112738, tid=1112747 # assert(t->is_Java_thread()) failed: incorrect cast to JavaThread # # JRE version: Java(TM) SE Runtime Environment (21.0) (fastdebug build 21-internal-LTS-2023-04-04-2141101.daniel.daugherty.8291555forjdk21.git) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-internal-LTS-2023-04-04-2141101.daniel.daugherty.8291555forjdk21.git, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x155baa4] is_lock_owned(Thread*, oop)+0x254 snip Stack: [0x0000fffdb8970000,0x0000fffdb8b70000], sp=0x0000fffdb8b6d480, free space=2037k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x155baa4] is_lock_owned(Thread*, oop)+0x254 (javaThread.hpp:983) V [libjvm.so+0x15627d0] ObjectSynchronizer::FastHashCode(Thread*, oop)+0x720 (synchronizer.cpp:956) V [libjvm.so+0x12dbedc] oopDesc::slow_identity_hash()+0x68 (oop.cpp:112) V [libjvm.so+0x105f044] JvmtiTagMapTable::find(oop)+0x1c4 (oop.inline.hpp:364) V [libjvm.so+0x1056744] CallbackWrapper::CallbackWrapper(JvmtiTagMap*, oop)+0xa4 (jvmtiTagMap.cpp:222) V [libjvm.so+0x105c538] CallbackInvoker::invoke_advanced_stack_ref_callback(jvmtiHeapReferenceKind, long, long, int, _jmethodID*, long, int, oop)+0x158 (jvmtiTagMap.cpp:1754) V [libjvm.so+0x105d688] JNILocalRootsClosure::do_oop(oop*)+0x168 (jvmtiTagMap.cpp:2019) V [libjvm.so+0xe8bac8] JNIHandleBlock::oops_do(OopClosure*)+0x68 (jniHandles.cpp:411) V [libjvm.so+0x105ddbc] VM_HeapWalkOperation::collect_stack_roots(JavaThread*, JNILocalRootsClosure*)+0x6dc (jvmtiTagMap.cpp:2725) V [libjvm.so+0x105e368] VM_HeapWalkOperation::collect_stack_roots()+0x138 (jvmtiTagMap.cpp:2772) V [libjvm.so+0x10555f0] VM_HeapWalkOperation::doit()+0x6e0 (jvmtiTagMap.cpp:2827) V [libjvm.so+0x1686ac0] VM_Operation::evaluate()+0x120 (vmOperations.cpp:71) V [libjvm.so+0x16b2710] VMThread::evaluate_operation(VM_Operation*)+0xd0 (vmThread.cpp:281) V [libjvm.so+0x16b3204] VMThread::inner_execute(VM_Operation*)+0x374 (vmThread.cpp:428) V [libjvm.so+0x16b33fc] VMThread::loop()+0x8c (vmThread.cpp:495) V [libjvm.so+0x16b352c] VMThread::run()+0x9c (vmThread.cpp:175) V [libjvm.so+0x15ad4b0] Thread::call_run()+0xac (thread.cpp:224) V [libjvm.so+0x130c0a8] thread_native_entry(Thread*)+0x134 (os_linux.cpp:740) C [libpthread.so.0+0x7908] start_thread+0x188 This code block in `ObjectSynchronizer::FastHashCode()`: // Fall thru so we only have one place that installs the hash in // the ObjectMonitor. } else if (LockingMode == 2 && mark.is_fast_locked() && is_lock_owned(current, obj)) { // This is a fast-lock owned by the calling thread so use the // markWord from the object. hash = mark.hash(); if (hash != 0) { // if it has a hash, just return it return hash; } } else if (LockingMode == 1 && mark.has_locker() && current->is_lock_owned((address)mark.locker())) { is calling this static function: static bool is_lock_owned(Thread* thread, oop obj) { assert(LockingMode == 2, "only call this with new lightweight locking enabled"); return JavaThread::cast(thread)->lock_stack().contains(obj); } and that function used to look like this: static bool is_lock_owned(Thread* thread, oop obj) { assert(LockingMode == 2, "only call this with new lightweight locking enabled"); - return thread->is_Java_thread() ? JavaThread::cast(thread)->lock_stack().contains(obj) : false; + return JavaThread::cast(thread)->lock_stack().contains(obj); } so that `thread->is_Java_thread()` check is needed since the VMThread is the one that's making this`is_lock_owned()` check as part of a hashcode operation. There are 129 test failures in Mach5 Tier4 and 769 test failures in Mach5 Tier5. I don't know yet whether all are due to: # Internal Error (/opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S40935/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b154597d-2ba7-420d-81c1-ef13f408c137/runs/d52e181d-f011-47c8-a35f-30fcbba5c164/workspace/open/src/hotspot/share/runtime/javaThread.hpp:983), pid=1112738, tid=1112747 # assert(t->is_Java_thread()) failed: incorrect cast to JavaThread ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1497720340 PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1497725623 From rkennke at openjdk.org Wed Apr 5 16:17:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 5 Apr 2023 16:17:44 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v55] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Put back thread type check in OS::is_lock_owned() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/baf71624..963de0ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=54 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=53-54 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From cslucas at openjdk.org Wed Apr 5 16:31:20 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 5 Apr 2023 16:31:20 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v7] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges that are used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straight forward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also tested with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge with Master - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. - Add support for SR'ing some inputs of merges used for field loads - Fix some typos and do some small refactorings. - Merge master - Add support for rematerializing scalar replaced objects participating in allocation merges ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=06 Stats: 2193 lines in 22 files changed: 1939 ins; 107 del; 147 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From dcubed at openjdk.org Wed Apr 5 16:50:54 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 5 Apr 2023 16:50:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v55] In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 16:17:44 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Put back thread type check in OS::is_lock_owned() I'm worried that I'll run into more cases where a JavaThread check was removed. However, I'll start with this: $ git diff diff --git a/src/hotspot/share/runtime/synchronizer.cpp b/src/hotspot/share/runtime/synchronizer.cpp index bf8afffd693..3c269885f97 100644 --- a/src/hotspot/share/runtime/synchronizer.cpp +++ b/src/hotspot/share/runtime/synchronizer.cpp @@ -894,9 +894,11 @@ static inline intptr_t get_next_hash(Thread* current, oop obj) { return value; } +// Can be called from non JavaThreads (e.g., VMThread) for FastHashCode +// calculations as part of JVM/TI tagging. static bool is_lock_owned(Thread* thread, oop obj) { assert(LockingMode == 2, "only call this with new lightweight locking enabled"); - return JavaThread::cast(thread)->lock_stack().contains(obj); + return thread->is_Java_thread() ? JavaThread::cast(thread)->lock_stack().contains(obj) : false; } intptr_t ObjectSynchronizer::FastHashCode(Thread* current, oop obj) { and see what else testing shakes out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1497815036 From shade at openjdk.org Wed Apr 5 17:05:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Apr 2023 17:05:15 GMT Subject: RFR: 8305618: Move gcold out of tier1 In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 13:06:43 GMT, Leo Korinth wrote: > I can add tier1_gc_gcold to tier2 or tier3 (and rename it to correct tier number), or I could just remove it and it will run in tier4. What do you prefer? I prefer tier3. GC stress tests belong there, I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13340#issuecomment-1497833746 From cjplummer at openjdk.org Wed Apr 5 19:08:24 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 5 Apr 2023 19:08:24 GMT Subject: RFR: 8250269: Replace ATTRIBUTE_ALIGNED with alignas [v15] In-Reply-To: References: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> Message-ID: On Sat, 4 Feb 2023 15:05:06 GMT, Julian Waters wrote: >> C++11 added the alignas attribute, for the purpose of specifying alignment on types, much like compiler specific syntax such as gcc's __attribute__((aligned(x))) or Visual C++'s __declspec(align(x)). >> >> We can phase out the use of the macro in favor of the standard attribute. In the meantime, we can replace the compiler specific definitions of ATTRIBUTE_ALIGNED with a portable definition. We might deprecate the use of the macro but changing its implementation quickly and cleanly applies the feature where the macro is being used. >> >> Note: With certain parts of HotSpot using ATTRIBUTE_ALIGNED so indiscriminately, this commit will likely take some time to get right >> >> This will require adding the alignas attribute to the list of language features approved for use in HotSpot code. (Completed with [8297912](https://github.com/openjdk/jdk/pull/11446)) > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'openjdk:master' into alignas > - alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - ... and 5 more: https://git.openjdk.org/jdk/compare/0e1d6550...a621bb62 This PR should be closed since the changes have been merged into #13258. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11431#issuecomment-1497981801 From dcubed at openjdk.org Wed Apr 5 19:37:45 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 5 Apr 2023 19:37:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v55] In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 16:17:44 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Put back thread type check in OS::is_lock_owned() All 8 test tasks that failed in my Tier4 passed with the patch that I posted above. The Tier4 is still not yet complete, but I'm going to go ahead and retest Tier5 to see if anything else shakes out. @rkennke - Please consider adding the comment I mentioned above: +// Can be called from non JavaThreads (e.g., VMThread) for FastHashCode +// calculations as part of JVM/TI tagging. static bool is_lock_owned(Thread* thread, oop obj) { That should prevent anyone from thinking that `is_lock_owned` can be optimized again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1498013391 From jwaters at openjdk.org Wed Apr 5 19:44:27 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 5 Apr 2023 19:44:27 GMT Subject: Withdrawn: 8250269: Replace ATTRIBUTE_ALIGNED with alignas In-Reply-To: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> References: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> Message-ID: On Wed, 30 Nov 2022 12:25:37 GMT, Julian Waters wrote: > C++11 added the alignas attribute, for the purpose of specifying alignment on types, much like compiler specific syntax such as gcc's __attribute__((aligned(x))) or Visual C++'s __declspec(align(x)). > > We can phase out the use of the macro in favor of the standard attribute. In the meantime, we can replace the compiler specific definitions of ATTRIBUTE_ALIGNED with a portable definition. We might deprecate the use of the macro but changing its implementation quickly and cleanly applies the feature where the macro is being used. > > Note: With certain parts of HotSpot using ATTRIBUTE_ALIGNED so indiscriminately, this commit will likely take some time to get right > > This will require adding the alignas attribute to the list of language features approved for use in HotSpot code. (Completed with [8297912](https://github.com/openjdk/jdk/pull/11446)) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/11431 From amenkov at openjdk.org Wed Apr 5 20:21:12 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 5 Apr 2023 20:21:12 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack Message-ID: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: - added heap scanning to report unmounted vthreads; - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; - common code to handle stack frames are moved into separate class; ------------- Commit messages: - tab - improved test - Merge branch 'openjdk:master' into vthread_follow_ref - update - tabs again - tabs - update - proto Changes: https://git.openjdk.org/jdk/pull/13254/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299414 Stats: 733 lines in 3 files changed: 636 ins; 86 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From cjplummer at openjdk.org Wed Apr 5 20:21:16 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 5 Apr 2023 20:21:16 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 30 Mar 2023 22:58:12 GMT, Alex Menkov wrote: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - added heap scanning to report unmounted vthreads; > - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; > - common code to handle stack frames are moved into separate class; test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 54: > 52: } > 53: } > 54: await(dumpedLatch); await() seems unnecessary given the use the !timeToStop flag. test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 83: > 81: System.out.println(referenced.getClass()); > 82: }); > 83: vthreadEnded.join(); Add comment that says something like "Make sure this vthread has exited so we can test that it no longer holds any stack references". test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 85: > 83: vthreadEnded.join(); > 84: > 85: Thread.sleep(2000); // wait for reference and unmount I think what you mean is you need to wait until the threads have made enough progress to create the references, and then you need to wait until they have had a chance to amount due to the await() call. This should be made more clear in the comments. BTW, you could choose to get JVMTI VIRTUAL_THREAD_UNMOUNT events, and instead block here until you get them all, but doing a sleep is a lot easier. test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 94: > 92: // expected to be unreported as stack local > 93: new TestCase(VThreadUnmountedEnded.class, 0, 0) > 94: }; I think it would be useful the have a test case which has expected_cnt > 1. test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 72: > 70: jvmtiHeapReferenceInfoStackLocal *stackInfo = (jvmtiHeapReferenceInfoStackLocal *)reference_info; > 71: refCounters.count[index]++; > 72: refCounters.threadId[index] = stackInfo->thread_id; If `count` is >1 at this point, can this line be an assert? I assume the threadId should never change for any given index once it is set. ------------- PR Review: https://git.openjdk.org/jdk/pull/13254#pullrequestreview-1369892549 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156534139 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156534779 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156538712 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156540510 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156520354 From amenkov at openjdk.org Wed Apr 5 20:21:18 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 5 Apr 2023 20:21:18 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Mon, 3 Apr 2023 23:11:49 GMT, Chris Plummer wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - added heap scanning to report unmounted vthreads; >> - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; >> - common code to handle stack frames are moved into separate class; > > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 54: > >> 52: } >> 53: } >> 54: await(dumpedLatch); > > await() seems unnecessary given the use the !timeToStop flag. Correct. Fixed. > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 83: > >> 81: System.out.println(referenced.getClass()); >> 82: }); >> 83: vthreadEnded.join(); > > Add comment that says something like "Make sure this vthread has exited so we can test that it no longer holds any stack references". Fixed > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 85: > >> 83: vthreadEnded.join(); >> 84: >> 85: Thread.sleep(2000); // wait for reference and unmount > > I think what you mean is you need to wait until the threads have made enough progress to create the references, and then you need to wait until they have had a chance to amount due to the await() call. This should be made more clear in the comments. > > BTW, you could choose to get JVMTI VIRTUAL_THREAD_UNMOUNT events, and instead block here until you get them all, but doing a sleep is a lot easier. Added comment. Sleep does the job, I don't think it makes sense to overcomplicate the test > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 94: > >> 92: // expected to be unreported as stack local >> 93: new TestCase(VThreadUnmountedEnded.class, 0, 0) >> 94: }; > > I think it would be useful the have a test case which has expected_cnt > 1. expected_cnt > 1 means there are references to 2 objects of the class or 2 references to the same object. I don't see how this would improve test coverage. 1 (or 0) reference to each object helps to keep the test simple > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 72: > >> 70: jvmtiHeapReferenceInfoStackLocal *stackInfo = (jvmtiHeapReferenceInfoStackLocal *)reference_info; >> 71: refCounters.count[index]++; >> 72: refCounters.threadId[index] = stackInfo->thread_id; > > If `count` is >1 at this point, can this line be an assert? I assume the threadId should never change for any given index once it is set. if count is > 1 the will fail later verifying the value I added "ERROR" logging ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156603534 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156603638 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156605011 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156573479 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1156603364 From cjplummer at openjdk.org Wed Apr 5 20:37:08 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 5 Apr 2023 20:37:08 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 30 Mar 2023 22:58:12 GMT, Alex Menkov wrote: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - added heap scanning to report unmounted vthreads; > - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; > - common code to handle stack frames are moved into separate class; test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 43: > 41: * mounted and unmounted virtual threads and reports correct thread id > 42: * (for mounted vthread it should be vthread id, and not carrier thread id). > 43: * Additionally tests that references from platform threads aree reported correctly "aree" -> "are" test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 93: > 91: createObjAndWait(VThreadMountedJNIReferenced.class); > 92: Reference.reachabilityFence(referenced); > 93: }); This code used to use a java loop to keep busy, but now it relies on a sleep loop in native code. Was the java loop problematic? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1158982377 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1158995466 From dcubed at openjdk.org Wed Apr 5 20:42:58 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 5 Apr 2023 20:42:58 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v55] In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 16:17:44 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Put back thread type check in OS::is_lock_owned() src/hotspot/cpu/aarch64/aarch64.ad line 3843: > 3841: > 3842: if (!UseHeavyMonitors) { > 3843: if (LockingMode == LIGHTWEIGHT) { You should consider changing uses of `UseHeavyMonitors` to the appropriate check of `LockingMode`. For this case, use: ` if (LockingMode != MONITOR) {` You'll also need to change the implementation of the `UseHeavyMonitors` option to set `LockingMode = MONITOR` or something like that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1158999786 From amenkov at openjdk.org Wed Apr 5 21:09:09 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 5 Apr 2023 21:09:09 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v2] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - added heap scanning to report unmounted vthreads; > - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; > - common code to handle stack frames are moved into separate class; Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: aree -> are ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/0eb9b050..8108f217 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From amenkov at openjdk.org Wed Apr 5 21:09:13 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 5 Apr 2023 21:09:13 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v2] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 5 Apr 2023 20:25:39 GMT, Chris Plummer wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> aree -> are > > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 43: > >> 41: * mounted and unmounted virtual threads and reports correct thread id >> 42: * (for mounted vthread it should be vthread id, and not carrier thread id). >> 43: * Additionally tests that references from platform threads aree reported correctly > > "aree" -> "are" Fixed > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 93: > >> 91: createObjAndWait(VThreadMountedJNIReferenced.class); >> 92: Reference.reachabilityFence(referenced); >> 93: }); > > This code used to use a java loop to keep busy, but now it relies on a sleep loop in native code. Was the java loop problematic? No. I had a failure of the test due racing, but I believe the reason was lack of synchronization. I decided to simplify the test - 2 virtual threads are enough and as the test verifies "JNI local on top frame" case, it needs block in native call anyway, so I use it to prevent unmount too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1159022346 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1159020544 From dlong at openjdk.org Thu Apr 6 00:28:04 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 6 Apr 2023 00:28:04 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 08:00:20 GMT, Hao Sun wrote: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of some exception scenarios (Recall the reason why we fail to use SP as the modifier). > > Finally, we choose to use value zero as the modifier. Trivially, it's compatible with virtual threads. However, compared to FP modifier, this solution would reduce the strength of PAC-RET protection to some extent. E.g., you get the same authentication code for each call to the function, whereas using FP gives you different codes as long as the stack depth is different. > > ### Implementation of Zero modifier > > Here list the key updates of this patch. > > 1. vm_version_aarch64.cpp > > Remove the constraint on "enable-preview" and "PreserveFramePointer". > > 2. macroAssembler_aarch64.cpp > > For utility protect_return_address(), 1) use PACIAZ/PACIZA instructions directly. 2) argument "temp_reg" is removed since all functions use the same modifier. 3) all the use sites are updated accordingly. This involves the updates in many files. > > Similar updates are done to utility authenticate_return_address(). > > Besides, aarch64.ad and AArch64TestAssembler.java are updated accordingly. > > 3. pauth_linux_aarch64.inline.hpp > > For utilities pauth_sign_return_address() and > pauth_authenticate_return_address(), remove the second argument and pass value zero to r16 register. > > Similarly, all the use sites are updated as well. This involves the updates in many files. > > 6. continuationHelper_aarch64.inline.hpp > > Introduce return_pc_at() and patch_pc_at() to avoid directly reading the saved PC or writing new signed PC on the stack in shared code. > > 7. Minor updates > > (1) sharedRuntime_aarch64.cpp: Add the missing > authenticate_return_address() use for function gen_continuation_enter(). In functions generate_deopt_blob() and generate_uncommon_trap_blob(), remove the authentication on the caller (3) frame since the return address is not used. > > (2) stubGenerator_aarch64.cpp: Add the missing > authenticate_return_address() use for function generate_cont_thaw(). > > (3) runtime.cpp: enable the authentication. > > ### Test > > 1. Cross compilations on arm32/s390/ppc/riscv passed. > 2. zero build and x86 build passed. > 3. tier1~3 passed on Linux/AArch64 w/ and w/o PAC-RET. > > Co-Developed-by: Nick Gasson > > [1] https://bugs.openjdk.org/browse/JDK-8277204 > [2] https://openjdk.org/jeps/425 > [3] https://github.com/openjdk/jdk/pull/9067 > [4] https://bugs.openjdk.org/browse/JDK-8288023 > [5] https://bugs.openjdk.org/browse/JDK-8301819 > [6] https://openjdk.org/jeps/444 > [7] https://www.usenix.org/conference/usenixsecurity21/presentation/liljestrand > [8] https://github.com/openjdk/jdk/pull/10441 Using SP seems like the right way to go. Can't we compute the correct SP value to use in Runtime1::generate_handle_exception()? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1498329206 From daniel.smith at oracle.com Thu Apr 6 00:53:46 2023 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 6 Apr 2023 00:53:46 +0000 Subject: 2023 JVM Language Summit Message-ID: <2F52B8E8-FC9A-499B-BBC4-FD227B2CEB40@oracle.com> 2023 JVM LANGUAGE SUMMIT -- CALL FOR SPEAKERS We are pleased to announce the 2023 JVM Language Summit to be held at Oracle?s Santa Clara campus on August 7-9, 2023. Registration is now open for all attendees. Speaker submissions will be accepted through May 17. The JVM Language Summit is an open technical collaboration among language designers, compiler writers, tool builders, runtime engineers, and VM architects. We will share our experiences as creators of both the JVM and programming languages for the JVM. We also welcome non-JVM developers of similar technologies to attend or speak on their runtime, VM, or language of choice. Presentations will be recorded and made available to the public. This event is being organized by language and JVM engineers -- no marketers involved! So bring your slide rules and be prepared for some seriously geeky discussions. The Summit will be followed by the OpenJDK Committers' Workshop on August 10-11. This year, the Workshop is a separate event with its own registration process. Please review additional details at http://jvmlangsummit.com. To register: register.jvmlangsummit.com For further information: jvmlangsummit.com Questions: inquire2023 at jvmlangsummit.com From sspitsyn at openjdk.org Thu Apr 6 01:35:22 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 6 Apr 2023 01:35:22 GMT Subject: Integrated: 8303563: GetCurrentThreadCpuTime and GetThreadCpuTime need further clarification for virtual threads In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 20:33:46 GMT, Serguei Spitsyn wrote: > This is a follow-up to [JDK-8302615](https://bugs.openjdk.org/browse/JDK-8302615) where GetCurrentThreadCpuTime and GetThreadCpuTime were changed from being not supported to optional, when called from/with a virtual thread. There are two additional sentences that need adjustment to avoid creating a conflict in the spec. > > In the functions `GetCurrentThreadCpuTime` and `GetThreadCpuTime`: > > The fragment: > `"The current thread may not be a virtual thread. Otherwise, the error code"` > > is replaced with: > > "An implementation is not required to support this function > when the current thread is a virtual thread, in which case" > > > CSR: [JDK-8305617](https://bugs.openjdk.org/browse/JDK-8305617) This pull request has now been integrated. Changeset: 57641190 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/5764119024be067ef7afb063a49a14ef59325af6 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8303563: GetCurrentThreadCpuTime and GetThreadCpuTime need further clarification for virtual threads Reviewed-by: dholmes, alanb ------------- PR: https://git.openjdk.org/jdk/pull/13338 From xgong at openjdk.org Thu Apr 6 01:51:17 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 6 Apr 2023 01:51:17 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 13:46:12 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 466: > 464: @IR(counts = {IRNode.VECTOR_SLICE, "17"}) > 465: static void testB128(byte[][] dst, byte[] src1, byte[] src2) { > 466: var species = ByteVector.SPECIES_128; Suggest to define the species as a "`private static final`" field of this test class. It may make the intrinsification fail if the species is not a constant to the compiler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1159206009 From xgong at openjdk.org Thu Apr 6 01:56:16 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 6 Apr 2023 01:56:16 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v5] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 13:24:18 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add identity, fix flags test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 327: > 325: > 326: @Test > 327: @IR(counts = {IRNode.VECTOR_SLICE, "7"}, applyIfCPUFeature = {"sse2", "true"}) How about separating the special cases (i.e. origin is `0/VLENGTH`), and using the `FailOn` check instead on them? Tests are more accurate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1159208577 From haosun at openjdk.org Thu Apr 6 02:51:15 2023 From: haosun at openjdk.org (Hao Sun) Date: Thu, 6 Apr 2023 02:51:15 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: References: Message-ID: <_xRONirw9uCWIt7pgv8pM4TKhAdYQFPO03bHZmFXTd4=.a8706cde-1e2c-46ef-b343-12256fb2bcbd@github.com> On Tue, 4 Apr 2023 08:58:35 GMT, Andrew Haley wrote: >> ### Background >> >> 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. >> >> 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. >> >> 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. >> >> 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. >> >> 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. >> >> ### Goal >> >> This patch aims to make PAC-RET compatible with virtual threads. >> >> ### Requirements of virtual threads >> >> R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. >> >> R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. >> >> Note that more details can be found in the discussion [3]. >> >> ### Investigation >> >> We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. >> >> 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. >> >> 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. >> >> 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of some exception scenarios (Recall the reason why we fail to use SP as the modifier). >> >> Finally, we choose to use value zero as the modifier. Trivially, it's compatible with virtual threads. However, compared to FP modifier, this solution would reduce the strength of PAC-RET protection to some extent. E.g., you get the same authentication code for each call to the function, whereas using FP gives you different codes as long as the stack depth is different. >> >> ### Implementation of Zero modifier >> >> Here list the key updates of this patch. >> >> 1. vm_version_aarch64.cpp >> >> Remove the constraint on "enable-preview" and "PreserveFramePointer". >> >> 2. macroAssembler_aarch64.cpp >> >> For utility protect_return_address(), 1) use PACIAZ/PACIZA instructions directly. 2) argument "temp_reg" is removed since all functions use the same modifier. 3) all the use sites are updated accordingly. This involves the updates in many files. >> >> Similar updates are done to utility authenticate_return_address(). >> >> Besides, aarch64.ad and AArch64TestAssembler.java are updated accordingly. >> >> 3. pauth_linux_aarch64.inline.hpp >> >> For utilities pauth_sign_return_address() and >> pauth_authenticate_return_address(), remove the second argument and pass value zero to r16 register. >> >> Similarly, all the use sites are updated as well. This involves the updates in many files. >> >> 6. continuationHelper_aarch64.inline.hpp >> >> Introduce return_pc_at() and patch_pc_at() to avoid directly reading the saved PC or writing new signed PC on the stack in shared code. >> >> 7. Minor updates >> >> (1) sharedRuntime_aarch64.cpp: Add the missing >> authenticate_return_address() use for function gen_continuation_enter(). In functions generate_deopt_blob() and generate_uncommon_trap_blob(), remove the authentication on the caller (3) frame since the return address is not used. >> >> (2) stubGenerator_aarch64.cpp: Add the missing >> authenticate_return_address() use for function generate_cont_thaw(). >> >> (3) runtime.cpp: enable the authentication. >> >> ### Test >> >> 1. Cross compilations on arm32/s390/ppc/riscv passed. >> 2. zero build and x86 build passed. >> 3. tier1~3 passed on Linux/AArch64 w/ and w/o PAC-RET. >> >> Co-Developed-by: Nick Gasson >> >> [1] https://bugs.openjdk.org/browse/JDK-8277204 >> [2] https://openjdk.org/jeps/425 >> [3] https://github.com/openjdk/jdk/pull/9067 >> [4] https://bugs.openjdk.org/browse/JDK-8288023 >> [5] https://bugs.openjdk.org/browse/JDK-8301819 >> [6] https://openjdk.org/jeps/444 >> [7] https://www.usenix.org/conference/usenixsecurity21/presentation/liljestrand >> [8] https://github.com/openjdk/jdk/pull/10441 > > src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp line 72: > >> 70: >> 71: inline address ContinuationHelper::return_pc_at(intptr_t* sp) { >> 72: return pauth_strip_pointer(*(address*)sp); > > This is the return address. it's called `return_address` elsewhere. I used `return_pc_at()` following the usage of `ContinuationHelper::Frame::return_pc()` and `ContinuationHelper::InterpretedFrame::return_pc()`. How about `get_pc_at()` or `return_address_at()`? Thanks. > src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp line 76: > >> 74: >> 75: inline void ContinuationHelper::patch_pc_at(intptr_t* sp, address pc) { >> 76: *(address*)sp = pauth_sign_return_address(pc); > > This is a bad name. We're not patching the PC. Thanks for your code review. >From `ContinuationHelper::Frame::patch_pc(const frame& f, address pc)` and `void frame::patch_pc(Thread* thread, address pc)`, I thought "patch_pc" means replacing the original PC with the new one, i.e. the argument PC. In this function `ContinuationHelper::patch_pc_at(intptr_t* sp, address pc)`, I followed the notion that **updating the original PC at address SP with the argument PC**. That's why I used "patch_pc_at". If you think it's not a good name, how about `set_pc_at()`? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1159230915 PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1159230844 From kvn at openjdk.org Thu Apr 6 04:46:25 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Apr 2023 04:46:25 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v7] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <4uPGi8Ulap_QoQpkL1zTZUdP-jdL_WDEkpdP7asLow4=.9047ce21-688f-4d29-a643-f9acfd4344c7@github.com> On Wed, 5 Apr 2023 16:31:20 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges that are used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straight forward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also tested with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - Merge master > - Add support for rematerializing scalar replaced objects participating in allocation merges Thank you for adding new node - it is more clear now. src/hotspot/share/opto/callnode.hpp line 540: > 538: > 539: bool is_only_merge_sr_candidate() { return _only_merge_sr_candidate; } > 540: void set_only_merge_sr_candidate(bool only) { _only_merge_sr_candidate = only; } May be drop `_sr` from names. `SafePointScalarObjectNode` already represents scalarized object. src/hotspot/share/opto/escape.cpp line 633: > 631: > 632: SafePointScalarMergeNode* smerge = new SafePointScalarMergeNode(merge_t, merge_idx); > 633: smerge->init_req(0, _compile->root()); May be use ophi's control here, it should stay bellow merge point. Was there a reason you use `root`? src/hotspot/share/opto/escape.cpp line 640: > 638: > 639: // Add the selector so we know which direction the execution took > 640: sfpt->add_req(selector); May be added comment that we adding debug info for merge point here (2 values described in the comment for `_merge_pointer_idx`). src/hotspot/share/opto/escape.cpp line 655: > 653: SafePointScalarObjectNode* sobj = mexp.create_scalarized_object_description(alloc, sfpt); > 654: if (sobj == nullptr) { > 655: fatal("Failed to create SafePointScalarObjectNode!"); This is brutal! May be exit this compilation and recompile without `ReduceAllocationMerges`. src/hotspot/share/opto/escape.cpp line 658: > 656: } > 657: > 658: jvms->set_endoff(sfpt->req()); add comment explaining this line src/hotspot/share/opto/escape.cpp line 677: > 675: > 676: // Replaces debug information references to "ophi" in "sfpt" with references to "smerge" > 677: int debug_end = jvms->debug_end(); May be add comment that debug info changed (and `debug_end`) due to added scalarized objects info. src/hotspot/share/opto/escape.cpp line 679: > 677: int debug_end = jvms->debug_end(); > 678: sfpt->replace_edges_in_range(ophi, smerge, debug_start, debug_end, _igvn); > 679: sfpt->set_req(smerge->merge_pointer_idx(jvms), ophi); So you trying to restore `ophi` in debug info which was added at line 637 but then in previous line may be replaced with `smerge`. May add comment explaining that. src/hotspot/share/opto/output.cpp line 755: > 753: ciKlass* cik = t->is_oopptr()->exact_klass(); > 754: assert(cik->is_instance_klass() || > 755: cik->is_array_klass(), "Not supported allocation."); Why spacing changed? src/hotspot/share/opto/output.cpp line 789: > 787: > 788: for (uint i = 1; i < smerge->req(); i++) { > 789: Node* fld_node = smerge->in(i); It is not `fld_node` but `obj_node`. ------------- PR Review: https://git.openjdk.org/jdk/pull/12897#pullrequestreview-1374000788 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159249159 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159245961 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159246463 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159255417 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159253457 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159256643 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159270793 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159272308 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1159271887 From stuefe at openjdk.org Thu Apr 6 05:12:51 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Apr 2023 05:12:51 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v54] In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 10:40:54 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Named constants for LockingMode src/hotspot/share/utilities/globalDefinitions.hpp line 1045: > 1043: }; > 1044: > 1045: enum LockingMode { Thank you for having named constants. But I'd use enum class here. Either that or prefix the names with something like "LM_", but enum class is preferred. The names are far to generic for global scope, especially since this lives in globalDefinitions.hpp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1159287860 From fyang at openjdk.org Thu Apr 6 06:02:55 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 6 Apr 2023 06:02:55 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v55] In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 16:17:44 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Put back thread type check in OS::is_lock_owned() src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6261: > 6259: { > 6260: // The following checks rely on the fact that LockStack is only ever modified by > 6261: // its owning stack, even if the lock got inflated concurrently; removal of LockStack Suggestion: s/its owning stack/its owning thread/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1159314439 From fyang at openjdk.org Thu Apr 6 07:30:58 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 6 Apr 2023 07:30:58 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v55] In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 16:17:44 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Put back thread type check in OS::is_lock_owned() Some update for riscv to reflect the latest changes: [riscv-update.txt](https://github.com/openjdk/jdk/files/11167462/riscv-update.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1498612617 From aph at openjdk.org Thu Apr 6 09:05:17 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 6 Apr 2023 09:05:17 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: <_xRONirw9uCWIt7pgv8pM4TKhAdYQFPO03bHZmFXTd4=.a8706cde-1e2c-46ef-b343-12256fb2bcbd@github.com> References: <_xRONirw9uCWIt7pgv8pM4TKhAdYQFPO03bHZmFXTd4=.a8706cde-1e2c-46ef-b343-12256fb2bcbd@github.com> Message-ID: <12VEPjXihdX9cqRjF26WdILuZmPjyqRtY3Bo4jcWVfo=.0f7c683c-929b-4b17-a7cc-f082f6dc8afe@github.com> On Thu, 6 Apr 2023 02:47:54 GMT, Hao Sun wrote: > Thanks for your code review. > > From `ContinuationHelper::Frame::patch_pc(const frame& f, address pc)` and `void frame::patch_pc(Thread* thread, address pc)`, I thought "patch_pc" means replacing the original PC with the new one, i.e. the argument PC. > > In this function `ContinuationHelper::patch_pc_at(intptr_t* sp, address pc)`, I followed the notion that **updating the original PC at address SP with the argument PC**. That's why I used "patch_pc_at". > > If you think it's not a good name, how about `set_pc_at()`? Thanks. A Program Counter is a physical thing. It's made of silicon and metal. It contains an address. `patch_return_address_at()` would be fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1159496721 From haosun at openjdk.org Thu Apr 6 10:01:17 2023 From: haosun at openjdk.org (Hao Sun) Date: Thu, 6 Apr 2023 10:01:17 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 00:25:29 GMT, Dean Long wrote: > Using SP seems like the right way to go. Can't we compute the correct SP value to use in Runtime1::generate_handle_exception()? Thanks for your question, Dean. Thinking more about it, I guess we can get the expected SP value. In my local test, I always use `rfp + 16` to authenticate the return address, and test cases under `test/jdk/java/lang/Thread/` and `test/hotspot/jtreg/compiler/c2/` can pass except the `virtual thread` cases. I'm running tier1-3 now. My concern is that **using absolute SP** is incompatible with **virtual thread**, since PAC re-sign is still needed due to the stack copying process (See Requirement-2 in the commit message). Alternatively, we may want to use **relative SP** as the modifier. However, I didn't know how to get such an **initial SP** value. Do you have any idea? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1498798117 From lkorinth at openjdk.org Thu Apr 6 10:16:06 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 6 Apr 2023 10:16:06 GMT Subject: RFR: 8305618: Move gcold out of tier1 [v2] In-Reply-To: References: Message-ID: > 8305618: Move gcold out of tier1 > > Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. Leo Korinth has updated the pull request incrementally with two additional commits since the last revision: - move gc groups near each others - As suggested by Shipil?v, move to tier3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13340/files - new: https://git.openjdk.org/jdk/pull/13340/files/72805395..8a6b6262 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13340&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13340&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13340.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13340/head:pull/13340 PR: https://git.openjdk.org/jdk/pull/13340 From shade at openjdk.org Thu Apr 6 10:16:07 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Apr 2023 10:16:07 GMT Subject: RFR: 8305618: Move gcold out of tier1 [v2] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 10:11:31 GMT, Leo Korinth wrote: >> 8305618: Move gcold out of tier1 >> >> Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. > > Leo Korinth has updated the pull request incrementally with two additional commits since the last revision: > > - move gc groups near each others > - As suggested by Shipil?v, move to tier3 I like this. Have you confirmed these actually run in `tier3`? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13340#pullrequestreview-1374518614 From pminborg at openjdk.org Thu Apr 6 10:54:18 2023 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 6 Apr 2023 10:54:18 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v20] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - 8305369: Issues in zero-length memory segment javadoc section - 8305087: MemoryLayout API checks should be more eager ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/0ee65ac1..c1999447 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=18-19 Stats: 340 lines in 21 files changed: 98 ins; 184 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From mbaesken at openjdk.org Thu Apr 6 11:13:10 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 6 Apr 2023 11:13:10 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v5] In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 08:29:55 GMT, Johannes Bechberger wrote: >> Fixes the issue by disabling PCDesc cache modifications when in ASGCT. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix fix > - Fix minor issues LGTM. I would like a short comment line in nmethod.cpp and/or forte.cpp shortly describing what you do and why but up to you. ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13144#pullrequestreview-1374643965 From jbechberger at openjdk.org Thu Apr 6 11:37:48 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 6 Apr 2023 11:37:48 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v6] In-Reply-To: References: Message-ID: > Fixes the issue by disabling PCDesc cache modifications when in ASGCT. > > Tested on my M1 mac. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Add comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13144/files - new: https://git.openjdk.org/jdk/pull/13144/files/6f1108ed..5d9df9a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13144&range=04-05 Stats: 3 lines in 2 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13144/head:pull/13144 PR: https://git.openjdk.org/jdk/pull/13144 From jbechberger at openjdk.org Thu Apr 6 11:37:52 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 6 Apr 2023 11:37:52 GMT Subject: RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v5] In-Reply-To: References: Message-ID: <9Ov7dy2ZGMsxEr8rM_7L6Nt3YqYHX3k_iWlobwy3h8I=.a9200352-e1d6-4d81-a284-21e32814b9a7@github.com> On Mon, 3 Apr 2023 08:29:55 GMT, Johannes Bechberger wrote: >> Fixes the issue by disabling PCDesc cache modifications when in ASGCT. >> >> Tested on my M1 mac. > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix fix > - Fix minor issues I added two comments :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1498919468 From rkennke at openjdk.org Thu Apr 6 11:59:45 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Apr 2023 11:59:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: RISCV update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/963de0ef..d1c88261 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=55 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=54-55 Stats: 73 lines in 5 files changed: 60 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Apr 6 11:59:45 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Apr 2023 11:59:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v55] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 07:27:27 GMT, Fei Yang wrote: > Some update for riscv to reflect the latest changes: [riscv-update.txt](https://github.com/openjdk/jdk/files/11167462/riscv-update.txt) Thank you! The patch did apply with fuzz - I pushed it as it is, can you check if it's still ok? Thanks, Roman ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1498945218 From alanb at openjdk.org Thu Apr 6 14:07:16 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 6 Apr 2023 14:07:16 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v2] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 5 Apr 2023 21:09:09 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - added heap scanning to report unmounted vthreads; >> - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; >> - common code to handle stack frames are moved into separate class; > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > aree -> are Do I read it correctly that the entire heap is walked to find the unmounted virtual threads? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1499116956 From dcubed at openjdk.org Thu Apr 6 14:27:14 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 6 Apr 2023 14:27:14 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 11:59:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > RISCV update I'm rerunning my Mach5 testing of v54 + forced-enable of the new lightweight locking. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1499151050 From dcubed at openjdk.org Thu Apr 6 14:42:09 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 6 Apr 2023 14:42:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 11:59:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > RISCV update src/hotspot/share/services/management.cpp line 1129: > 1127: // give wrong results when Java threads are running and > 1128: // entering/leaving locks while we inspect the thread stacks. > 1129: if (maxDepth == 0) { You've removed the code that cause safepoint code path to be taken here so this comment is no longer correct. Also, I've verified that: vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock005/TestDescription.java no longer fails on my MBP13 so I agree that the `start_processing()` call made the ZGC -Xcomp config much happier. src/hotspot/share/services/management.cpp line 1131: > 1129: if (maxDepth == 0) { > 1130: // No stack trace to dump so we do not need to stop the world. > 1131: // Since we never do the VM op here we must set the threads list. You may want to add a little more to this comment after L1131: // Since we are not stopping the world, the data we gather here // may change the moment after we return it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1159887650 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1159890005 From alanb at openjdk.org Thu Apr 6 15:49:27 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 6 Apr 2023 15:49:27 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: > JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. > > There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. > > In addition, there are a small number of implementation changes to sync up from the loom fibers branch: > > - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. > - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. > - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. > - New system property to print a stack trace when a virtual thread sets its own value of a TL. > - ThreadPerTaskExecutor is changed to use FutureTask. > > Testing: tier1-6. Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Test/comments updates - Merge - Expand tests for jdk.ThreadSleep event - Review feedback - Merge - Fix ThreadSleepEvent again - Test updates - ThreadSleepEvent refactoring - Merge - Merge - ... and 1 more: https://git.openjdk.org/jdk/compare/6e04878d...a5bb3fd9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13203/files - new: https://git.openjdk.org/jdk/pull/13203/files/722d5afa..a5bb3fd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=04-05 Stats: 21808 lines in 499 files changed: 11410 ins; 8547 del; 1851 mod Patch: https://git.openjdk.org/jdk/pull/13203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13203/head:pull/13203 PR: https://git.openjdk.org/jdk/pull/13203 From lkorinth at openjdk.org Thu Apr 6 16:11:33 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 6 Apr 2023 16:11:33 GMT Subject: RFR: 8305618: Move gcold out of tier1 [v2] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 10:16:06 GMT, Leo Korinth wrote: >> 8305618: Move gcold out of tier1 >> >> Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. > > Leo Korinth has updated the pull request incrementally with two additional commits since the last revision: > > - move gc groups near each others > - As suggested by Shipil?v, move to tier3 Yes, it passes and it runs gcold. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13340#issuecomment-1499305003 From amenkov at openjdk.org Thu Apr 6 18:20:16 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 6 Apr 2023 18:20:16 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v2] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 6 Apr 2023 14:03:50 GMT, Alan Bateman wrote: > Do I read it correctly that the entire heap is walked to find the unmounted virtual threads? Correct. I don't see other way to find unmounted threads ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1499446517 From lmesnik at openjdk.org Thu Apr 6 18:57:07 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 6 Apr 2023 18:57:07 GMT Subject: RFR: 8305618: Move gcold out of tier1 [v2] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 10:16:06 GMT, Leo Korinth wrote: >> 8305618: Move gcold out of tier1 >> >> Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. > > Leo Korinth has updated the pull request incrementally with two additional commits since the last revision: > > - move gc groups near each others > - As suggested by Shipil?v, move to tier3 Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13340#pullrequestreview-1375409889 From alanb at openjdk.org Thu Apr 6 18:57:15 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 6 Apr 2023 18:57:15 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v2] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 6 Apr 2023 18:17:29 GMT, Alex Menkov wrote: > Correct. I don't see other way to find unmounted threads FollowReferences is a graph walk so it will visit the reachable virtual Threads. If I'm not mistake, VThreadClosure will iterate over unreachable Virtual Thread objects. That might be okay as they should be terminated and thus not have any frames, but maybe the other approach needs to be explored too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1499484526 From dlong at openjdk.org Thu Apr 6 20:36:05 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 6 Apr 2023 20:36:05 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 09:58:39 GMT, Hao Sun wrote: >> Using SP seems like the right way to go. Can't we compute the correct SP value to use in Runtime1::generate_handle_exception()? > >> Using SP seems like the right way to go. Can't we compute the correct SP value to use in Runtime1::generate_handle_exception()? > > Thanks for your question, Dean. > Thinking more about it, I guess we can get the expected SP value. > > In my local test, I always use `rfp + 16` to authenticate the return address, and test cases under `test/jdk/java/lang/Thread/` and `test/hotspot/jtreg/compiler/c2/` can pass except the `virtual thread` cases. I'm running tier1-3 now. > > My concern is that **using absolute SP** is incompatible with **virtual thread**, since PAC re-sign is still needed due to the stack copying process (See Requirement-2 in the commit message). Alternatively, we may want to use **relative SP** as the modifier. > > However, I didn't know how to get such an **initial SP** value. Do you have any idea? > Thanks. @shqking, to get a relative SP, I think you would want to subtract SP from thread->last_continuation()->entry_sp(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1499590398 From amenkov at openjdk.org Thu Apr 6 23:11:43 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 6 Apr 2023 23:11:43 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v2] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 6 Apr 2023 18:53:59 GMT, Alan Bateman wrote: > FollowReferences is a graph walk so it will visit the reachable virtual Threads. If I'm not mistake, VThreadClosure will iterate over unreachable Virtual Thread objects. That might be okay as they should be terminated and thus not have any frames, but maybe the other approach needs to be explored too. The fix is for the case when FollowReferences is called with null initial_object. Per spec in the case "references are followed from the heap roots". And: "The heap root are the set of system classes, JNI globals, references from thread stacks, and other objects used as roots for the purposes of garbage collection." FollowReferences visits all reachable virtual threads only if agent callback always returns JVMTI_VISIT_OBJECTS. If agent callback doesn't return JVMTI_VISIT_OBJECTS for some object, references from the object are not traversed, so some unmounted vthreads may be missed. VThreadClosure iterates over all VirtualThread objects, but skips mounted and terminated threads. After reporting the object is marked "visited" and won't be reported again when some objects have reference to it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1499733642 From amenkov at openjdk.org Fri Apr 7 02:20:04 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 7 Apr 2023 02:20:04 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v3] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: <6mhiHz0YwjatkbtDRngp1M_N8QYMuD4SRWUFty_OEZ8=.c03ce00a-449f-4590-a2fe-9edc7e5f5981@github.com> > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - added heap scanning to report unmounted vthreads; > - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; > - common code to handle stack frames are moved into separate class; Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Fixed test - replaced obsolete java.util.concurrent.ForkJoinPool.common.parallelism with jdk.virtualThreadScheduler.parallelism; - added check that vthreads are mounted/unmounted; - disabled testing of JNI locals for unmounted thread as native call pins vthread and does not allow it to unmount. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/8108f217..841f5a78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=01-02 Stats: 45 lines in 1 file changed: 30 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From amenkov at openjdk.org Fri Apr 7 02:25:43 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 7 Apr 2023 02:25:43 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v4] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: <6PJnRfvOuXFiO9boUg2aQevPX6458AsY1nYQNcZDX70=.faf4c611-4f5e-4258-aef2-d9154d7ff0b2@github.com> > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - added heap scanning to report unmounted vthreads; > - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; > - common code to handle stack frames are moved into separate class; Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: trailing spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/841f5a78..47657252 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From lmesnik at openjdk.org Fri Apr 7 02:30:45 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 7 Apr 2023 02:30:45 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v4] In-Reply-To: <6PJnRfvOuXFiO9boUg2aQevPX6458AsY1nYQNcZDX70=.faf4c611-4f5e-4258-aef2-d9154d7ff0b2@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> <6PJnRfvOuXFiO9boUg2aQevPX6458AsY1nYQNcZDX70=.faf4c611-4f5e-4258-aef2-d9154d7ff0b2@github.com> Message-ID: On Fri, 7 Apr 2023 02:25:43 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - added heap scanning to report unmounted vthreads; >> - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; >> - common code to handle stack frames are moved into separate class; > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > trailing spaces Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 179: > 177: } > 178: > 179: static volatile bool timeToExit = false; It is not enough to make variable volatile in c++. You need to make it atomic or use monitors to correctly synchronize. ------------- PR Review: https://git.openjdk.org/jdk/pull/13254#pullrequestreview-1375759394 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1160389940 From fyang at openjdk.org Fri Apr 7 02:39:56 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Apr 2023 02:39:56 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 13:35:25 GMT, Per Minborg wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Add example for Option::captureStateLayout > > A review of all the copyright years shall be made in this PR. > Hi @minborg, looks like some changes were missed on riscv port. I've added these changes and submitted tests on linux-riscv. `jdk_foreign` still passed with release & fatdebug build. Could you please add these extra changes for riscv? Thanks. Here is the patch: [foreign_riscv_port_patch.txt](https://github.com/openjdk/jdk/files/11037700/foreign_riscv_port_patch.txt) @feilongjiang : Hello, the riscv-specific changes looks good to me. Thanks for the update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1499868465 From haosun at openjdk.org Fri Apr 7 03:25:44 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 7 Apr 2023 03:25:44 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v2] In-Reply-To: References: Message-ID: > ### Background > > 1. PAC-RET branch protection was initially implemented on Linux/AArch64 in JDK-8277204 [1]. > > 2. However, it was broken with the introduction of virtual threads [2], mainly because the continuation freeze/thaw mechanism would trigger stack copying to/from memory, whereas the saved and signed LR on the stack doesn't get re-signed accordingly. > > 3. PR-9067 [3] tried to implement the re-sign part, but it was not accepted because option "PreserveFramePointer" is always turned on by PAC-RET but this would slow down virtual threads by ~5-20x. > > 4. As a workaround, JDK-8288023 [4] disables PAC-RET when preview language features are enabled. Note that virtual thread is one preview feature then. > > 5. Virtual thread will become a permanent feature in JDK-21 [5][6]. > > ### Goal > > This patch aims to make PAC-RET compatible with virtual threads. > > ### Requirements of virtual threads > > R-1: Option "PreserveFramePointer" should be turned off. That is, PAC-RET implementation should not rely on frame pointer FP. Otherwise, the fast path in stack copying will never be taken. > > R-2: Use some invariant values to stack copying as the modifier, so as to avoid the PAC re-sign for continuation thaw, as the fast path in stack copying doesn't walk the frame. > > Note that more details can be found in the discussion [3]. > > ### Investigation > > We considered to use (relative) stack pointer SP, thread ID, PACStack [7] and value zero as the candidate modifier. > > 1. SP: In some scenarios, we need to authenticate the return address in places where the current SP doesn't match the SP on function entry. E.g. see the usage in Runtime1::generate_handle_exception(). Hence, neither absolute nor relative SP works. > > 2. thread ID (tid): It's invariant to virtual thread, but it's nontrivial to access it from the JIT side. We need 1) firstly resolve the address of current thread (See [8] as an example), and 2) get the tid field in the way like java_lang_Thread::thread_id(). I suppose this would introduce big performance overhead. Then can we turn to use "rthread" register (JavaThread object address) as the modifier? Unfortunately, it's not an invariant to virtual threads and PAC re-sign is still needed. > > 5. PACStack uses the signed return address of caller as the modifier to sign the callee's return address. In this way, we get one PACed call chain. The modifier should be saved into somewhere around the frame record. Inevitably, FP should be preserved to make it easy to find this modifier in case of some exception scenarios (Recall the reason why we fail to use SP as the modifier). > > Finally, we choose to use value zero as the modifier. Trivially, it's compatible with virtual threads. However, compared to FP modifier, this solution would reduce the strength of PAC-RET protection to some extent. E.g., you get the same authentication code for each call to the function, whereas using FP gives you different codes as long as the stack depth is different. > > ### Implementation of Zero modifier > > Here list the key updates of this patch. > > 1. vm_version_aarch64.cpp > > Remove the constraint on "enable-preview" and "PreserveFramePointer". > > 2. macroAssembler_aarch64.cpp > > For utility protect_return_address(), 1) use PACIAZ/PACIZA instructions directly. 2) argument "temp_reg" is removed since all functions use the same modifier. 3) all the use sites are updated accordingly. This involves the updates in many files. > > Similar updates are done to utility authenticate_return_address(). > > Besides, aarch64.ad and AArch64TestAssembler.java are updated accordingly. > > 3. pauth_linux_aarch64.inline.hpp > > For utilities pauth_sign_return_address() and > pauth_authenticate_return_address(), remove the second argument and pass value zero to r16 register. > > Similarly, all the use sites are updated as well. This involves the updates in many files. > > 6. continuationHelper_aarch64.inline.hpp > > Introduce return_pc_at() and patch_pc_at() to avoid directly reading the saved PC or writing new signed PC on the stack in shared code. > > 7. Minor updates > > (1) sharedRuntime_aarch64.cpp: Add the missing > authenticate_return_address() use for function gen_continuation_enter(). In functions generate_deopt_blob() and generate_uncommon_trap_blob(), remove the authentication on the caller (3) frame since the return address is not used. > > (2) stubGenerator_aarch64.cpp: Add the missing > authenticate_return_address() use for function generate_cont_thaw(). > > (3) runtime.cpp: enable the authentication. > > ### Test > > 1. Cross compilations on arm32/s390/ppc/riscv passed. > 2. zero build and x86 build passed. > 3. tier1~3 passed on Linux/AArch64 w/ and w/o PAC-RET. > > Co-Developed-by: Nick Gasson > > [1] https://bugs.openjdk.org/browse/JDK-8277204 > [2] https://openjdk.org/jeps/425 > [3] https://github.com/openjdk/jdk/pull/9067 > [4] https://bugs.openjdk.org/browse/JDK-8288023 > [5] https://bugs.openjdk.org/browse/JDK-8301819 > [6] https://openjdk.org/jeps/444 > [7] https://www.usenix.org/conference/usenixsecurity21/presentation/liljestrand > [8] https://github.com/openjdk/jdk/pull/10441 Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Rename return_pc_at and patch_pc_at Rename return_pc_at to return_address_at. Rename patch_pc_at to patch_return_address_at. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13322/files - new: https://git.openjdk.org/jdk/pull/13322/files/0403fbde..5bd587aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13322&range=00-01 Stats: 31 lines in 12 files changed: 0 ins; 0 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/13322.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13322/head:pull/13322 PR: https://git.openjdk.org/jdk/pull/13322 From haosun at openjdk.org Fri Apr 7 03:25:45 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 7 Apr 2023 03:25:45 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 09:58:39 GMT, Hao Sun wrote: >> Using SP seems like the right way to go. Can't we compute the correct SP value to use in Runtime1::generate_handle_exception()? > >> Using SP seems like the right way to go. Can't we compute the correct SP value to use in Runtime1::generate_handle_exception()? > > Thanks for your question, Dean. > Thinking more about it, I guess we can get the expected SP value. > > In my local test, I always use `rfp + 16` to authenticate the return address, and test cases under `test/jdk/java/lang/Thread/` and `test/hotspot/jtreg/compiler/c2/` can pass except the `virtual thread` cases. I'm running tier1-3 now. > > My concern is that **using absolute SP** is incompatible with **virtual thread**, since PAC re-sign is still needed due to the stack copying process (See Requirement-2 in the commit message). Alternatively, we may want to use **relative SP** as the modifier. > > However, I didn't know how to get such an **initial SP** value. Do you have any idea? > Thanks. > @shqking, to get a relative SP, I think you would want to subtract SP from thread->last_continuation()->entry_sp(). Thanks for your hint. I will take a try. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13322#issuecomment-1499890789 From haosun at openjdk.org Fri Apr 7 03:25:48 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 7 Apr 2023 03:25:48 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v2] In-Reply-To: <_xRONirw9uCWIt7pgv8pM4TKhAdYQFPO03bHZmFXTd4=.a8706cde-1e2c-46ef-b343-12256fb2bcbd@github.com> References: <_xRONirw9uCWIt7pgv8pM4TKhAdYQFPO03bHZmFXTd4=.a8706cde-1e2c-46ef-b343-12256fb2bcbd@github.com> Message-ID: On Thu, 6 Apr 2023 02:48:03 GMT, Hao Sun wrote: >> src/hotspot/cpu/aarch64/continuationHelper_aarch64.inline.hpp line 72: >> >>> 70: >>> 71: inline address ContinuationHelper::return_pc_at(intptr_t* sp) { >>> 72: return pauth_strip_pointer(*(address*)sp); >> >> This is the return address. it's called `return_address` elsewhere. > > I used `return_pc_at()` following the usage of `ContinuationHelper::Frame::return_pc()` and `ContinuationHelper::InterpretedFrame::return_pc()`. > > How about `get_pc_at()` or `return_address_at()`? > Thanks. Rename to `return_address_at()` in the latest revision. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1160407248 From haosun at openjdk.org Fri Apr 7 03:25:50 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 7 Apr 2023 03:25:50 GMT Subject: RFR: 8287325: AArch64: fix virtual threads with -XX:UseBranchProtection=pac-ret [v2] In-Reply-To: <12VEPjXihdX9cqRjF26WdILuZmPjyqRtY3Bo4jcWVfo=.0f7c683c-929b-4b17-a7cc-f082f6dc8afe@github.com> References: <_xRONirw9uCWIt7pgv8pM4TKhAdYQFPO03bHZmFXTd4=.a8706cde-1e2c-46ef-b343-12256fb2bcbd@github.com> <12VEPjXihdX9cqRjF26WdILuZmPjyqRtY3Bo4jcWVfo=.0f7c683c-929b-4b17-a7cc-f082f6dc8afe@github.com> Message-ID: On Thu, 6 Apr 2023 09:02:13 GMT, Andrew Haley wrote: >> Thanks for your code review. >> >> From `ContinuationHelper::Frame::patch_pc(const frame& f, address pc)` and `void frame::patch_pc(Thread* thread, address pc)`, I thought "patch_pc" means replacing the original PC with the new one, i.e. the argument PC. >> >> In this function `ContinuationHelper::patch_pc_at(intptr_t* sp, address pc)`, I followed the notion that **updating the original PC at address SP with the argument PC**. That's why I used "patch_pc_at". >> >> If you think it's not a good name, how about `set_pc_at()`? >> Thanks. > >> Thanks for your code review. >> >> From `ContinuationHelper::Frame::patch_pc(const frame& f, address pc)` and `void frame::patch_pc(Thread* thread, address pc)`, I thought "patch_pc" means replacing the original PC with the new one, i.e. the argument PC. >> >> In this function `ContinuationHelper::patch_pc_at(intptr_t* sp, address pc)`, I followed the notion that **updating the original PC at address SP with the argument PC**. That's why I used "patch_pc_at". >> >> If you think it's not a good name, how about `set_pc_at()`? Thanks. > > A Program Counter is a physical thing. It's made of silicon and metal. It contains an address. > > `patch_return_address_at()` would be fine. Updated in the latest revision. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13322#discussion_r1160407073 From fjiang at openjdk.org Fri Apr 7 04:29:49 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 7 Apr 2023 04:29:49 GMT Subject: RFR: 8305728: RISC-V: Use bexti instruction to do single-bit testing Message-ID: Current RISC-V port tests bit masks with `andi` instruction. But for those mask values not in the range of `simm12` (`andi` only accepts sign-extended 12-bit immediate [1]), we need an extra temp register (`t0` as default for `andi`) to store the mask value [2]. Since we now support Zbs extension of Bit-Manipulation, we have a more convenient way to test power-of-two bit masks with the single instruction `bexti` [3] without any temp register. 1. https://github.com/riscv/riscv-isa-manual/blob/f6b8d5c7d2dcd935b48689a337c8f5bc2be4b5e5/src/rv32.tex#L519-L521 2. https://github.com/openjdk/jdk/blob/ce6e7461dc5ac56459a79e75d5de76929d1be0a3/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1852-L1860 3. https://github.com/riscv/riscv-bitmanip/blob/main/bitmanip/insns/bexti.adoc Testing: - [x] `hotspot_tier1`, `jdk_tier1` on QEMU-User w/ `UseZbs` (release build) - [ ] tier1 tests on unmatched board w/o `UseZbs` (release build) ------------- Commit messages: - add test_bit to test power of two bit mask Changes: https://git.openjdk.org/jdk/pull/13368/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13368&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305728 Stats: 86 lines in 15 files changed: 12 ins; 0 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/13368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13368/head:pull/13368 PR: https://git.openjdk.org/jdk/pull/13368 From amenkov at openjdk.org Fri Apr 7 04:51:39 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 7 Apr 2023 04:51:39 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v4] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> <6PJnRfvOuXFiO9boUg2aQevPX6458AsY1nYQNcZDX70=.faf4c611-4f5e-4258-aef2-d9154d7ff0b2@github.com> Message-ID: On Fri, 7 Apr 2023 02:27:59 GMT, Leonid Mesnik wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> trailing spaces > > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 179: > >> 177: } >> 178: >> 179: static volatile bool timeToExit = false; > > It is not enough to make variable volatile in c++. You need to make it atomic or use monitors to correctly synchronize. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1160432556 From amenkov at openjdk.org Fri Apr 7 04:51:36 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 7 Apr 2023 04:51:36 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v5] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - added heap scanning to report unmounted vthreads; > - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; > - common code to handle stack frames are moved into separate class; Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Use atomic for synchronization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/47657252..f7831794 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From jpai at openjdk.org Fri Apr 7 06:29:51 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 06:29:51 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Thu, 6 Apr 2023 15:49:27 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - Merge > - Merge > - ... and 1 more: https://git.openjdk.org/jdk/compare/7adf8162...a5bb3fd9 src/java.base/share/classes/java/lang/ThreadLocal.java line 825: > 823: // switch to carrier thread to avoid recursive use of thread-locals > 824: vthread.executeOnCarrierThread(() -> { > 825: System.out.println(vthread); Hello Alan, as far as I have seen, much of a our debug logs/stacktrace in the JDK uses `System.err` to write them out. For example, `Thread.dumpStack()`, then even `java.security.debug` logging and many such places. Is it intentional that this tracing here uses `System.out` instead? src/java.base/share/classes/java/lang/ThreadLocal.java line 832: > 830: }); > 831: } catch (Exception e) { > 832: throw new InternalError(e); Should inability to log/trace `ThreadLocal` creation or `set` lead to those operations failing? Or would it be OK, if we just ignored this exception that happened when tracing/logging? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160468749 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160469548 From alanb at openjdk.org Fri Apr 7 06:38:48 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Apr 2023 06:38:48 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Fri, 7 Apr 2023 06:26:30 GMT, Jaikiran Pai wrote: > Or would it be OK, if we just ignored this exception that happened when tracing/logging? If, for example, something has changed System.out to be a throwing PrintStream, and this diagnostic option is enabled, then it may fail here. As it's just a diagnostic option then I don't think it's a bit issue. Also this code will change with a future PR that will integrate this will filtering. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160473898 From jpai at openjdk.org Fri Apr 7 06:45:52 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 06:45:52 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Thu, 6 Apr 2023 15:49:27 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - Merge > - Merge > - ... and 1 more: https://git.openjdk.org/jdk/compare/430702c5...a5bb3fd9 src/java.base/share/classes/java/lang/ThreadLocal.java line 823: > 821: .collect(Collectors.toList())); > 822: > 823: // switch to carrier thread to avoid recursive use of thread-locals The lambda here uses `System.out` to format and print and there's a loop which converts each collect `StackFrame` to a printable `StackTraceElement`. I had a brief look at the internals of the classes involved, but couldn't spot any code that instantiates or calls `set()` on any `ThreadLocal`s. Is there any specific code here which would trigger recursive thread-local usage or is it more a precaution to prevent any such potential usage. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160476293 From alanb at openjdk.org Fri Apr 7 06:45:55 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Apr 2023 06:45:55 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Fri, 7 Apr 2023 06:24:43 GMT, Jaikiran Pai wrote: > Hello Alan, as far as I have seen, much of a our debug logs/stacktrace in the JDK uses `System.err` to write them out. For example, `Thread.dumpStack()`, then even `java.security.debug` logging and many such places. Is it intentional that this tracing here uses `System.out` instead? The tracing/diagnostic options print to stdout so I just kept it consistent. As I mentioned in another comment, there is further work on filtering that will require re-visiting this and maybe we can look at having broader consistency then, just not this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160476763 From jpai at openjdk.org Fri Apr 7 06:49:51 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 06:49:51 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Thu, 6 Apr 2023 15:49:27 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - Merge > - Merge > - ... and 1 more: https://git.openjdk.org/jdk/compare/a6347823...a5bb3fd9 src/java.base/share/classes/java/lang/ThreadLocal.java line 805: > 803: /** > 804: * Reads the value of the jdk.traceVirtualThreadLocals property to determine if > 805: * a stack trace should be printed when a virtual threads sets a thread local. Trivial typo - "virtual threads sets ...." should have been "virtual thread sets ...." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160478015 From alanb at openjdk.org Fri Apr 7 06:49:54 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Apr 2023 06:49:54 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Fri, 7 Apr 2023 06:41:26 GMT, Jaikiran Pai wrote: > The lambda here uses `System.out` to format and print and there's a loop which converts each collect `StackFrame` to a printable `StackTraceElement`. I had a brief look at the internals of the classes involved, but couldn't spot any code that instantiates or calls `set()` on any `ThreadLocal`s. Is there any specific code here which would trigger recursive thread-local usage or is it more a precaution to prevent any such potential usage. Formatting is locale specific and amounts to running arbitrary code, so this cod has to defend against recursive calls. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160478956 From jpai at openjdk.org Fri Apr 7 06:59:53 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 06:59:53 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Thu, 6 Apr 2023 15:49:27 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - Merge > - Merge > - ... and 1 more: https://git.openjdk.org/jdk/compare/c776b73f...a5bb3fd9 src/java.base/share/classes/jdk/internal/javac/PreviewFeature.java line 72: > 70: RECORD_PATTERNS, > 71: // not used > 72: VIRTUAL_THREADS, The javadoc `Preview` page lists the preview features (and the JEPs) that are part of a release. For example, for JDK 20 the page is here https://docs.oracle.com/en/java/javase/20/docs/api/preview-list.html. In a JDK image that's built out of this PR, does the missing `@JEP` annotation on this enum value cause issues in javadoc image generation or presentation on that page? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160484235 From alanb at openjdk.org Fri Apr 7 07:11:50 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Apr 2023 07:11:50 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Fri, 7 Apr 2023 06:56:51 GMT, Jaikiran Pai wrote: > In a JDK image that's built out of this PR, does the missing `@JEP` annotation on this enum value cause issues in javadoc image generation or presentation on that page? This has been looked at a few times and we didn't find any issues. Ideally the constant would be removed but we are forced to leave it for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160490631 From jpai at openjdk.org Fri Apr 7 07:11:55 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 07:11:55 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Thu, 6 Apr 2023 15:49:27 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - Merge > - Merge > - ... and 1 more: https://git.openjdk.org/jdk/compare/24494df1...a5bb3fd9 test/jdk/com/sun/jdi/SuspendAfterDeath.java line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. This should have been `2022, 2023,` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160490224 From jpai at openjdk.org Fri Apr 7 07:28:52 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 07:28:52 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Thu, 6 Apr 2023 15:49:27 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - Merge > - Merge > - ... and 1 more: https://git.openjdk.org/jdk/compare/a85d6cbe...a5bb3fd9 test/jdk/java/lang/Thread/java.base/jdk/internal/event/ThreadSleepEvent.java line 29: > 27: * ThreadSleepEvent to optionally throw OOME at create, begin or commit time. > 28: */ > 29: public class ThreadSleepEvent { Should this extend `jdk.internal.event.Event` and then have each of the methods have a `@Override` on them? Or would that cause some issue when this is used in a jtreg test in the `@compile` directive? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160500794 From duke at openjdk.org Fri Apr 7 07:34:54 2023 From: duke at openjdk.org (ExE Boss) Date: Fri, 7 Apr 2023 07:34:54 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v8] In-Reply-To: References: <4JIhKmX2VnDfArfFl-1YJfoUzGGBVA5Uvd3mdpatW-s=.5d86f29e-5475-4a4a-91df-d6418356e204@github.com> Message-ID: On Tue, 28 Mar 2023 10:00:28 GMT, Per Minborg wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line 297: >> >>> 295: MethodType mtype = mh.type(); >>> 296: int[] perms = new int[mtype.parameterCount()]; >>> 297: MethodType swappedType = MethodType.methodType(mtype.returnType()); >> >> Instead?of `MethodType::appendParameterTypes(?)` (which?performs an?expensive?lookup of?a?new?cached `MethodType` value per?call), this?method should?instead use?a?`Class[]` for?the?arguments, which?avoids that?overhead inside?a?loop: >> >> public static MethodHandle swapArguments(MethodHandle mh, int firstArg, int secondArg) { >> MethodType mtype = mh.type(); >> int[] perms = new int[mtype.parameterCount()]; >> Class[] ptypes = new Class[perms.length]; >> for (int i = 0 ; i < perms.length ; i++) { >> int dst = i; >> if (i == firstArg) dst = secondArg; >> else if (i == secondArg) dst = firstArg; >> perms[i] = dst; >> ptypes[i] = mtype.parameterType(dst); >> } >> // This should use `JavaLangInvokeAccess` to invoke the internal >> // `MethodType.methodType(Class rtype, Class[] ptypes, boolean trusted)` >> // method with a `trusted` value of `true`: >> MethodType swappedType = MethodType.methodType(mtype.returnType(), ptypes); >> return permuteArguments(mh, swappedType, perms); >> } > > Thanks for this enhancement proposal. I hope you do not mind me asking if you could file a separate issue about this where you describe the above? We can then merge that proposal independent of this PR. I?don?t (yet)?have a?**JBS**?account, so?it?s?painful to?file?issues because the?**Java Bug?Report** website[^1] doesn?t support neither **Markdown** nor?**Jira**?formatting (and?also the?`JI?*` to?`JDK?*` issue?transfer waiting?time). [^1]: https://bugreport.java.com/bugreport/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1160504428 From jpai at openjdk.org Fri Apr 7 07:41:52 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 07:41:52 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: <5qIKJpxmQtkzyHNAtMdEPV9NHYUGMXinsHk6gTOk8WM=.4ffa9d7e-ae4f-4ab3-acbd-4c2e59642659@github.com> On Thu, 6 Apr 2023 15:49:27 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - Merge > - Merge > - ... and 1 more: https://git.openjdk.org/jdk/compare/c2ee11c7...a5bb3fd9 test/jdk/java/lang/Thread/virtual/TraceVirtualThreadLocals.java line 52: > 50: name.get(); > 51: }); > 52: assertContains(output, "java.lang.ThreadLocal.get"); Should it also assert the presence of `ThreadLocal.setInitialValue` in the stacktrace? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160508153 From jpai at openjdk.org Fri Apr 7 07:49:58 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 07:49:58 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Fri, 7 Apr 2023 07:25:59 GMT, Jaikiran Pai wrote: >> Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Test/comments updates >> - Merge >> - Expand tests for jdk.ThreadSleep event >> - Review feedback >> - Merge >> - Fix ThreadSleepEvent again >> - Test updates >> - ThreadSleepEvent refactoring >> - Merge >> - Merge >> - ... and 1 more: https://git.openjdk.org/jdk/compare/97c49893...a5bb3fd9 > > test/jdk/java/lang/Thread/java.base/jdk/internal/event/ThreadSleepEvent.java line 29: > >> 27: * ThreadSleepEvent to optionally throw OOME at create, begin or commit time. >> 28: */ >> 29: public class ThreadSleepEvent { > > Should this extend `jdk.internal.event.Event` and then have each of the methods have a `@Override` on them? Or would that cause some issue when this is used in a jtreg test in the `@compile` directive? Same comment for the newly introduced `test/jdk/java/lang/Thread/virtual/java.base/jdk/internal/event/VirtualThreadPinnedEvent.java` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160511761 From jpai at openjdk.org Fri Apr 7 07:50:00 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 07:50:00 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Thu, 6 Apr 2023 15:49:27 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - Merge > - Merge > - ... and 1 more: https://git.openjdk.org/jdk/compare/97c49893...a5bb3fd9 test/jdk/java/lang/Thread/virtual/YieldQueuing.java line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. Should have been `2022, 2023,` test/jdk/java/lang/management/ThreadMXBean/VirtualThreadDeadlocks.java line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. Same as some other files with copyright year updates, should have been `2022, 2023,` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160510856 PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160513282 From duke at openjdk.org Fri Apr 7 07:52:53 2023 From: duke at openjdk.org (ExE Boss) Date: Fri, 7 Apr 2023 07:52:53 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v8] In-Reply-To: References: <4JIhKmX2VnDfArfFl-1YJfoUzGGBVA5Uvd3mdpatW-s=.5d86f29e-5475-4a4a-91df-d6418356e204@github.com> Message-ID: On Fri, 7 Apr 2023 07:32:02 GMT, ExE Boss wrote: >> Thanks for this enhancement proposal. I hope you do not mind me asking if you could file a separate issue about this where you describe the above? We can then merge that proposal independent of this PR. > > I?don?t (yet)?have a?**JBS**?account, so?it?s?painful to?file?issues because the?**Java Bug?Report** website[^1] doesn?t support neither **Markdown** nor?**Jira**?formatting (and?also the?`JI?*` to?`JDK?*` issue?transfer waiting?time). > > [^1]: https://bugreport.java.com/bugreport/ Filed?as: [JI?9075066](https://bugs.openjdk.org/browse/JI-9075066 "[JI?9075066] Optimise `jdk.internal.foreign.abi.SharedUtils::swapArguments(?)`") ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1160514139 From alanb at openjdk.org Fri Apr 7 08:39:50 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Apr 2023 08:39:50 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Fri, 7 Apr 2023 07:44:39 GMT, Jaikiran Pai wrote: >> test/jdk/java/lang/Thread/java.base/jdk/internal/event/ThreadSleepEvent.java line 29: >> >>> 27: * ThreadSleepEvent to optionally throw OOME at create, begin or commit time. >>> 28: */ >>> 29: public class ThreadSleepEvent { >> >> Should this extend `jdk.internal.event.Event` and then have each of the methods have a `@Override` on them? Or would that cause some issue when this is used in a jtreg test in the `@compile` directive? > > Same comment for the newly introduced `test/jdk/java/lang/Thread/virtual/java.base/jdk/internal/event/VirtualThreadPinnedEvent.java` > Should this extend `jdk.internal.event.Event` and then have each of the methods have a `@Override` on them? Or would that cause some issue when this is used in a jtreg test in the `@compile` directive? Okay, we can do that do as it might be cleaner for something changing these tests in the future and help to find issues quickly in the event that the internal infrastructure for JFR events in java.base changes in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160543904 From alanb at openjdk.org Fri Apr 7 08:39:56 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Apr 2023 08:39:56 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: <_8akvgHjBo-ri1FBYwKY9ZvnCl1wgJplmI7ws6Q4giU=.05eb7df4-06dd-402a-91a6-37cdbb8bb04e@github.com> On Fri, 7 Apr 2023 07:43:02 GMT, Jaikiran Pai wrote: >> Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Test/comments updates >> - Merge >> - Expand tests for jdk.ThreadSleep event >> - Review feedback >> - Merge >> - Fix ThreadSleepEvent again >> - Test updates >> - ThreadSleepEvent refactoring >> - Merge >> - Merge >> - ... and 1 more: https://git.openjdk.org/jdk/compare/9db93ce4...a5bb3fd9 > > test/jdk/java/lang/Thread/virtual/YieldQueuing.java line 2: > >> 1: /* >> 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > > Should have been `2022, 2023,` Yes, it should. There are 200+ tests updated, the majority of which were updated with `sed` rather than manual edits, got it wrong for 5 tests it seems. I'll fix those, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160542385 From alanb at openjdk.org Fri Apr 7 08:46:52 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Apr 2023 08:46:52 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v6] In-Reply-To: <5qIKJpxmQtkzyHNAtMdEPV9NHYUGMXinsHk6gTOk8WM=.4ffa9d7e-ae4f-4ab3-acbd-4c2e59642659@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <5qIKJpxmQtkzyHNAtMdEPV9NHYUGMXinsHk6gTOk8WM=.4ffa9d7e-ae4f-4ab3-acbd-4c2e59642659@github.com> Message-ID: On Fri, 7 Apr 2023 07:38:31 GMT, Jaikiran Pai wrote: >> Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Test/comments updates >> - Merge >> - Expand tests for jdk.ThreadSleep event >> - Review feedback >> - Merge >> - Fix ThreadSleepEvent again >> - Test updates >> - ThreadSleepEvent refactoring >> - Merge >> - Merge >> - ... and 1 more: https://git.openjdk.org/jdk/compare/49869acd...a5bb3fd9 > > test/jdk/java/lang/Thread/virtual/TraceVirtualThreadLocals.java line 52: > >> 50: name.get(); >> 51: }); >> 52: assertContains(output, "java.lang.ThreadLocal.get"); > > Should it also assert the presence of `ThreadLocal.setInitialValue` in the stacktrace? It's the `get` method that triggers the initial value to be generated so this is why the test checks that method. But I think you have a point it would be clearer if the the test were to match on something that has "initialValue" in the string. It means using the name of an internal method but it might be okay here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13203#discussion_r1160548887 From alanb at openjdk.org Fri Apr 7 11:54:00 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Apr 2023 11:54:00 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v7] In-Reply-To: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: <-mPFKr9vh4NoXyvJfYAlS0UmtkuxdKdk0UZpm6lcif0=.adcf09ea-f301-4c82-93d0-f6cfcc7a8b51@github.com> > JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. > > There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. > > In addition, there are a small number of implementation changes to sync up from the loom fibers branch: > > - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. > - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. > - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. > - New system property to print a stack trace when a virtual thread sets its own value of a TL. > - ThreadPerTaskExecutor is changed to use FutureTask. > > Testing: tier1-6. Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge - Test updates to address review comments - Test/comments updates - Merge - Expand tests for jdk.ThreadSleep event - Review feedback - Merge - Fix ThreadSleepEvent again - Test updates - ThreadSleepEvent refactoring - ... and 3 more: https://git.openjdk.org/jdk/compare/a19243ec...cd680f66 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13203/files - new: https://git.openjdk.org/jdk/pull/13203/files/a5bb3fd9..cd680f66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13203&range=05-06 Stats: 3163 lines in 90 files changed: 2642 ins; 442 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/13203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13203/head:pull/13203 PR: https://git.openjdk.org/jdk/pull/13203 From jpai at openjdk.org Fri Apr 7 12:44:51 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Fri, 7 Apr 2023 12:44:51 GMT Subject: RFR: 8304919: Implementation of Virtual Threads [v7] In-Reply-To: <-mPFKr9vh4NoXyvJfYAlS0UmtkuxdKdk0UZpm6lcif0=.adcf09ea-f301-4c82-93d0-f6cfcc7a8b51@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> <-mPFKr9vh4NoXyvJfYAlS0UmtkuxdKdk0UZpm6lcif0=.adcf09ea-f301-4c82-93d0-f6cfcc7a8b51@github.com> Message-ID: <2mMJCOu4-A2LKcNdNqVHXGy_yg8Yncc67BysDb7sedc=.5852ccfc-c175-46c3-9bc0-a9058320215f@github.com> On Fri, 7 Apr 2023 11:54:00 GMT, Alan Bateman wrote: >> JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. >> >> There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. >> >> In addition, there are a small number of implementation changes to sync up from the loom fibers branch: >> >> - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. >> - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. >> - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. >> - New system property to print a stack trace when a virtual thread sets its own value of a TL. >> - ThreadPerTaskExecutor is changed to use FutureTask. >> >> Testing: tier1-6. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge > - Test updates to address review comments > - Test/comments updates > - Merge > - Expand tests for jdk.ThreadSleep event > - Review feedback > - Merge > - Fix ThreadSleepEvent again > - Test updates > - ThreadSleepEvent refactoring > - ... and 3 more: https://git.openjdk.org/jdk/compare/af05156d...cd680f66 Thank you for the updates Alan, they look fine to me. I've gone through the java side changes in this PR and the tests (except for hotspot tests) and they all look fine to me. ------------- Marked as reviewed by jpai (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13203#pullrequestreview-1376162404 From jbhateja at openjdk.org Fri Apr 7 14:22:56 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 7 Apr 2023 14:22:56 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 31 Mar 2023 12:25:16 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > small cosmetics src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 96: > 94: } > 95: Vector shufvec = this.toBitsVector(); > 96: VectorMask vecmask = shufvec.compare(VectorOperators.LT, 0); This may impact the intrinsification over AVX1 targets for floating point shuffles. Since bits vector is an integral vector and AVX1 does support 32 byte floats but not 32 byte integral vectors. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 226: > 224: > 225: AbstractSpecies species = vspecies().asIntegral(); > 226: Vector iota = species.iota(); we can do an early exist by returning species..iota() if start = 0 and step = 1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1160650526 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1160672743 From jbhateja at openjdk.org Fri Apr 7 14:22:57 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 7 Apr 2023 14:22:57 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 12:36:04 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> small cosmetics > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 226: > >> 224: >> 225: AbstractSpecies species = vspecies().asIntegral(); >> 226: Vector iota = species.iota(); > > we can do an early exist by returning species..iota() if start = 0 and step = 1 Power of two step count may be replaced by logical right shifts. But special handling may impact generic path , currently c2 inline expander handles these special cases. Alternatively we can keep this implementation at its and enhance vector idealizations to handle identity scenarios, multiply by 1, addition by 0, shift replacement for power of two multiply, since their scalar counterparts do handle these cases and SLP generated code gets a benefit of that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1160706670 From qamai at openjdk.org Fri Apr 7 17:13:50 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 7 Apr 2023 17:13:50 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: Message-ID: > Hi, > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {external_word} > vpackusdw %xmm0,%xmm0,%xmm0 > vpackuswb %xmm0,%xmm0,%xmm0 > vpmovsxbd %xmm0,%xmm3 > vpcmpgtd %xmm3,%xmm1,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fc2acb4e0d8 > vpmovzxbd %xmm0,%xmm0 > vpermd %ymm2,%ymm0,%ymm0 > movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} > vmovdqu %xmm0,0x10(%r10) > > After: > movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} > vmovdqu 0x10(%r10),%xmm2 > vpxor %xmm0,%xmm0,%xmm0 > vpcmpgtd %xmm2,%xmm0,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fa818b27cb1 > vpermd %ymm1,%ymm2,%ymm0 > movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} > vmovdqu %xmm0,0x10(%r10) > > Please take a look and leave reviews. Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: special case iotaShuffle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13093/files - new: https://git.openjdk.org/jdk/pull/13093/files/97c8fabf..079a6b5f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=05-06 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13093/head:pull/13093 PR: https://git.openjdk.org/jdk/pull/13093 From qamai at openjdk.org Fri Apr 7 17:14:14 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 7 Apr 2023 17:14:14 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 13:36:22 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 226: >> >>> 224: >>> 225: AbstractSpecies species = vspecies().asIntegral(); >>> 226: Vector iota = species.iota(); >> >> we can do an early exist by returning species..iota() if start = 0 and step = 1 > > Power of two step count may be replaced by logical right shifts. But special handling may impact generic path > , currently c2 inline expander handles these special cases. > > Alternatively we can keep this implementation at its and enhance vector idealizations to handle identity scenarios, multiply by 1, addition by 0, shift replacement for power of two multiply, since their scalar counterparts do handle these cases and SLP generated code gets a benefit of that. Thanks a lot for your review, I think that transforming a multiplication by a power of 2 into a shift can be done by the C2 compiler. I have added the special case for `start = 0 && step == 1` since it may be more common and can be optimised away when the arguments are constants. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1160841447 From qamai at openjdk.org Fri Apr 7 18:06:53 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 7 Apr 2023 18:06:53 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 11:51:21 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> small cosmetics > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 96: > >> 94: } >> 95: Vector shufvec = this.toBitsVector(); >> 96: VectorMask vecmask = shufvec.compare(VectorOperators.LT, 0); > > This may impact the intrinsification over AVX1 targets for floating point shuffles. Since bits vector is an integral vector and AVX1 does support 32 byte floats but not 32 byte integral vectors. Yes I think it is a drawback of this approach, however currently we do not support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 seems to be a special case in this regard. This species of float and double may also be less common in the usage of Vector API since it is larger than SPECIES_PREFERRED. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1160868954 From matsaave at openjdk.org Fri Apr 7 19:20:44 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 7 Apr 2023 19:20:44 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 03:32:27 GMT, Ioi Lam wrote: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Great cleanup! Making CDS easier to read and use is always a plus. Just some observations/nits: ------------- PR Comment: https://git.openjdk.org/jdk/pull/13284#issuecomment-1500569237 From matsaave at openjdk.org Fri Apr 7 19:57:48 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 7 Apr 2023 19:57:48 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 03:32:27 GMT, Ioi Lam wrote: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Changes requested by matsaave (Committer). src/hotspot/share/cds/archiveBuilder.cpp line 1086: > 1084: p2i(to_requested(start)), size_t(end - start)); > 1085: log_data(start, end, to_requested(start), /*is_heap=*/true); > 1086: } These log messages can be placed inside the else case before the break src/hotspot/share/cds/archiveHeapWriter.cpp line 369: > 367: template void ArchiveHeapWriter::store_requested_oop_in_buffer(T* buffered_addr, > 368: oop request_oop) { > 369: //assert(is_in_requested_regions(request_oop), "must be"); Some left over commented code. I assume this should be removed or a new assert should be here to replace it. src/hotspot/share/cds/archiveHeapWriter.cpp line 529: > 527: num_non_null_ptrs ++; > 528: > 529: if (max_idx < idx) { Is there a built in min() function we can use here? Maybe std::min()? src/hotspot/share/cds/filemap.cpp line 1674: > 1672: > 1673: char* buffer = NEW_C_HEAP_ARRAY(char, size_in_bytes, mtClassShared); > 1674: size_t written = write_bitmap(ptrmap, buffer, 0); Maybe add a comment to clarify there is no offset? Constants in method parameters can be confusing sometimes. src/hotspot/share/cds/filemap.cpp line 2035: > 2033: } > 2034: if (end < e) { > 2035: end = e; Like mentioned before, maybe we have max() and min() methods to use here. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 520: > 518: } else { > 519: return true; > 520: } Maybe make this `return reserved.contains(range.start()) && reserved.contains(range.last())` ------------- PR Review: https://git.openjdk.org/jdk/pull/13284#pullrequestreview-1376508819 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1160911406 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1160911559 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1160913791 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1160916309 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1160916888 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1160924110 From lmesnik at openjdk.org Fri Apr 7 21:23:42 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 7 Apr 2023 21:23:42 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects Message-ID: Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. ------------- Commit messages: - fixed EA - Merge branch 'master' of https://github.com/openjdk/jdk into 8277573 - 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects Changes: https://git.openjdk.org/jdk/pull/13312/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13312&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8277573 Stats: 78 lines in 11 files changed: 69 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13312.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13312/head:pull/13312 PR: https://git.openjdk.org/jdk/pull/13312 From lmesnik at openjdk.org Fri Apr 7 21:23:43 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 7 Apr 2023 21:23:43 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 22:20:43 GMT, Leonid Mesnik wrote: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. Thanks to Vladimir K for fix in escape.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13312#issuecomment-1500656370 From lmesnik at openjdk.org Fri Apr 7 21:48:43 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 7 Apr 2023 21:48:43 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: <3MwcnyFKbGUB420hx0RykEG43HA9Uyra08ZRFQUms7Y=.4dada609-deee-49d7-a21b-e29c8ecd3d36@github.com> On Mon, 3 Apr 2023 22:20:43 GMT, Leonid Mesnik wrote: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. Tested with all CI tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13312#issuecomment-1500670951 From kvn at openjdk.org Fri Apr 7 22:23:42 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Apr 2023 22:23:42 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 22:20:43 GMT, Leonid Mesnik wrote: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13312#pullrequestreview-1376633778 From wkemper at openjdk.org Fri Apr 7 23:09:40 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Apr 2023 23:09:40 GMT Subject: RFR: 8305767: HdrSeq: support for a merge() method Message-ID: A merge functionality on stats (distributions) was needed for the remembered set scan that I was using in some companion work. This PR implements a first cut at that, which is sufficient for our first (and only) use case. Unfortunately, for expediency, I am deferring work on decaying statistics, as a result of which users that want decaying statistics will get NaNs instead (or trigger guarantees). ------------- Commit messages: - 8298597: HdrSeq: support for a merge() method Changes: https://git.openjdk.org/jdk/pull/13395/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13395&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305767 Stats: 180 lines in 5 files changed: 153 ins; 2 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/13395.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13395/head:pull/13395 PR: https://git.openjdk.org/jdk/pull/13395 From stuefe at openjdk.org Sat Apr 8 07:17:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 8 Apr 2023 07:17:44 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 03:32:27 GMT, Ioi Lam wrote: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 This looks like a nice simplification. Will you also combine all mappings at OS level to a single one, so that you only need one mmap call? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13284#issuecomment-1500812579 From jwaters at openjdk.org Sat Apr 8 13:24:37 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 8 Apr 2023 13:24:37 GMT Subject: RFR: 8305341: Alignment should be enforced by alignas instead of compiler specific attributes [v3] In-Reply-To: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> References: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> Message-ID: > C11 has been stable for a long time on all platforms, so native code can use the standard alignas operator for alignment requirements Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Semicolon ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13258/files - new: https://git.openjdk.org/jdk/pull/13258/files/7dc7f7d8..07f5c702 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13258&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13258&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13258/head:pull/13258 PR: https://git.openjdk.org/jdk/pull/13258 From lgxbslgx at gmail.com Sun Apr 9 15:02:55 2023 From: lgxbslgx at gmail.com (Guoxiong Li) Date: Sun, 9 Apr 2023 23:02:55 +0800 Subject: [Investigation] Considering using a hashtable to store the signature handlers Message-ID: Hi all, I notice these are two arrays `_fingerprints` and `_handlers` in class `SignatureHandlerLibrary`[1] which are used to store the addresses of the signature handler of the native method. But it seems not very efficient because the worst search time is O(N). If the most recently created handlers are likely to be used mostly, their search time will always be O(n), because the recently created handlers are added to the end of the array and the search begins from the first element of the array to the last element. So I want to use a hashtable instead of arrays to store the signature handlers. And the entry of such a hashtable at least contains the fingerprint and address. What is your opinion? Any ideas are appreciated. Best Regards. -- Guoxiong [1] https://github.com/openjdk/jdk/blob/50d73352068f588cf6db29acb56e21b0412ab768/src/hotspot/share/interpreter/interpreterRuntime.hpp#L170 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbhateja at openjdk.org Mon Apr 10 15:14:51 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 10 Apr 2023 15:14:51 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 17:12:08 GMT, Quan Anh Mai wrote: >> Power of two step count may be replaced by logical right shifts. But special handling may impact generic path >> , currently c2 inline expander handles these special cases. >> >> Alternatively we can keep this implementation at its and enhance vector idealizations to handle identity scenarios, multiply by 1, addition by 0, shift replacement for power of two multiply, since their scalar counterparts do handle these cases and SLP generated code gets a benefit of that. > > Thanks a lot for your review, I think that transforming a multiplication by a power of 2 into a shift can be done by the C2 compiler. I have added the special case for `start = 0 && step == 1` since it may be more common and can be optimised away when the arguments are constants. For x86 byte vector multiplication is done at granularity of short lanes, this case shows regression with power of two multiplications which are strength reduced to shifts currently. please file a follow up bug report for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161806381 From jbhateja at openjdk.org Mon Apr 10 17:24:54 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 10 Apr 2023 17:24:54 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 18:04:16 GMT, Quan Anh Mai wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 96: >> >>> 94: } >>> 95: Vector shufvec = this.toBitsVector(); >>> 96: VectorMask vecmask = shufvec.compare(VectorOperators.LT, 0); >> >> This may impact the intrinsification over AVX1 targets for floating point shuffles. Since bits vector is an integral vector and AVX1 does support 32 byte floats but not 32 byte integral vectors. > > Yes I think it is a drawback of this approach, however currently we do not support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 seems to be a special case in this regard. This species of float and double may also be less common in the usage of Vector API since it is larger than SPECIES_PREFERRED. Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector algorithms intercepting both integral and floating point vectors. FTR, we see a perf regression with Float256 based micro now on AVX=1 targets, public static short micro() { VectorShuffle iota = FloatVector.SPECIES_256.iotaShuffle(0, 1, true); return iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1); } CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef CompileCommand: compileonly shufflef.micro bool compileonly = true ** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0 ** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0 @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) @ 34 java.lang.Object::getClass (0 bytes) (intrinsic) @ 54 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) [time] 386ms [res]3392 CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/ CPROMPT>export PATH=$JAVA_HOME/bin:$PATH CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef CompileCommand: compileonly shufflef.micro bool compileonly = true WARNING: Using incubator modules: jdk.incubator.vector @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) @ 17 jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes) (intrinsic) @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) [time] 7ms [res]3392 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161810585 From qamai at openjdk.org Mon Apr 10 18:43:25 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 10 Apr 2023 18:43:25 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Mon, 10 Apr 2023 15:11:55 GMT, Jatin Bhateja wrote: >> Thanks a lot for your review, I think that transforming a multiplication by a power of 2 into a shift can be done by the C2 compiler. I have added the special case for `start = 0 && step == 1` since it may be more common and can be optimised away when the arguments are constants. > > For x86 byte vector multiplication is done at granularity of short lanes, this case shows regression with power of two multiplications which are strength reduced to shifts currently. please file a follow up bug report for this. @jatin-bhateja I have created [JDK-8305810](https://bugs.openjdk.org/browse/JDK-8305810) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161976583 From sspitsyn at openjdk.org Mon Apr 10 18:53:04 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 10 Apr 2023 18:53:04 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 22:20:43 GMT, Leonid Mesnik wrote: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. It looks pretty good. But I'd like to request a couple of changes. The `notify_allocation` sounds to generic. What about to replace it with `notify_jvmti_vm_object_alloc`? I'll post another request separately. src/hotspot/share/opto/library_call.cpp line 2856: > 2854: set_result(ideal.value(result)); > 2855: return true; > 2856: #else Nit: It is better to replace #else at 2856 with #endif. Then #endif at 2859 is not needed. ------------- Changes requested by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13312#pullrequestreview-1377960281 PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1161981967 From sspitsyn at openjdk.org Mon Apr 10 19:00:58 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 10 Apr 2023 19:00:58 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 22:20:43 GMT, Leonid Mesnik wrote: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. src/hotspot/share/prims/jvmtiEventController.cpp line 727: > 725: JvmtiExport::set_should_post_on_exceptions((any_env_thread_enabled & SHOULD_POST_ON_EXCEPTIONS_BITS) != 0); > 726: > 727: JvmtiExport::_should_post_allocation_notifications = JvmtiExport::should_post_vm_object_alloc(); I'm not sure why this flag is needed. It looks like a dup of `JvmtiExport::should_post_vm_object_alloc()`. Can we just replace it with `JvmtiExport::should_post_vm_object_alloc()`? test/hotspot/jtreg/ProblemList-Xcomp.txt line 41: > 39: serviceability/sa/TestJhsdbJstackMixed.java 8248675 linux-aarch64 > 40: > 41: serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java 8288430 generic-all If the 8288430 is a dup of 8277573 then should we close it as such? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1161989234 PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1161990486 From qamai at openjdk.org Mon Apr 10 19:05:35 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 10 Apr 2023 19:05:35 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Mon, 10 Apr 2023 15:16:59 GMT, Jatin Bhateja wrote: >> Yes I think it is a drawback of this approach, however currently we do not support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 seems to be a special case in this regard. This species of float and double may also be less common in the usage of Vector API since it is larger than SPECIES_PREFERRED. > > Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector algorithms intercepting both integral and floating point vectors. > > FTR, we see a perf regression with Float256 based micro now on AVX=1 targets, > > > public static short micro() { > VectorShuffle iota = FloatVector.SPECIES_256.iotaShuffle(0, 1, true); > return iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1); > } > > CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef > CompileCommand: compileonly shufflef.micro bool compileonly = true > ** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0 > ** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0 > @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) > @ 34 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 54 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) > @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) > [time] 386ms [res]3392 > CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/ > CPROMPT>export PATH=$JAVA_HOME/bin:$PATH > CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef > CompileCommand: compileonly shufflef.micro bool compileonly = true > WARNING: Using incubator modules: jdk.incubator.vector > @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) > @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) > @ 17 jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) > [time] 7ms [res]3392 I see, what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161994748 From lmesnik at openjdk.org Mon Apr 10 22:07:38 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 10 Apr 2023 22:07:38 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 10 Apr 2023 18:46:42 GMT, Serguei Spitsyn wrote: >> Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. > > src/hotspot/share/opto/library_call.cpp line 2856: > >> 2854: set_result(ideal.value(result)); >> 2855: return true; >> 2856: #else > > Nit: It is better to replace #else at 2856 with #endif. Then #endif at 2859 is not needed. In this case the code is: ``` set_result(ideal.value(result)); return true; set_result(obj); return true; Which might cause compiler warnings and complains from static analyzers. > src/hotspot/share/prims/jvmtiEventController.cpp line 727: > >> 725: JvmtiExport::set_should_post_on_exceptions((any_env_thread_enabled & SHOULD_POST_ON_EXCEPTIONS_BITS) != 0); >> 726: >> 727: JvmtiExport::_should_post_allocation_notifications = JvmtiExport::should_post_vm_object_alloc(); > > I'm not sure why this flag is needed. It looks like a dup of `JvmtiExport::should_post_vm_object_alloc()`. > Can we just replace it with `JvmtiExport::should_post_vm_object_alloc()`? I don't think we could replace it by function. Also, I think that it is needed later to add SampledObjectAlloc event here. It should consider VM internal object allocations along with all allocations. > test/hotspot/jtreg/ProblemList-Xcomp.txt line 41: > >> 39: serviceability/sa/TestJhsdbJstackMixed.java 8248675 linux-aarch64 >> 40: >> 41: serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java 8288430 generic-all > > If the 8288430 is a dup of 8277573 then should we close it as such? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1162123421 PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1162124409 PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1162124580 From sspitsyn at openjdk.org Tue Apr 11 00:00:36 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 11 Apr 2023 00:00:36 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 10 Apr 2023 22:02:10 GMT, Leonid Mesnik wrote: >> src/hotspot/share/opto/library_call.cpp line 2856: >> >>> 2854: set_result(ideal.value(result)); >>> 2855: return true; >>> 2856: #else >> >> Nit: It is better to replace #else at 2856 with #endif. Then #endif at 2859 is not needed. > > In this case the code is: > ``` > set_result(ideal.value(result)); > return true; > set_result(obj); > return true; > > > Which might cause compiler warnings and complains from static analyzers. I wonder if it possible to do the following: + . . . . . . . + final_sync(ideal); + obj = ideal.value(result); + return true; + #endif // INCLUDE_JVMTI set_result(obj); return true; But I'm not sure if one more sync is needed in such a case. At lease, this line should not be under #if/#else: ` return true;` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1162175803 From sspitsyn at openjdk.org Tue Apr 11 00:00:39 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 11 Apr 2023 00:00:39 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 22:20:43 GMT, Leonid Mesnik wrote: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. src/hotspot/share/prims/jvmtiExport.cpp line 1050: > 1048: > 1049: // This flag is read by C2 during VM internal objects allocation > 1050: bool JvmtiExport::_should_post_allocation_notifications = true; Needs to be initialized with `false`. src/hotspot/share/prims/jvmtiExport.hpp line 400: > 398: > 399: // Used by C2 to post vm_object_alloc > 400: static bool _should_post_allocation_notifications; As we privately discussed, consider replacing it with something like: `_should_notify_vm_object_alloc`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1162176341 PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1162177316 From sspitsyn at openjdk.org Tue Apr 11 00:12:38 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 11 Apr 2023 00:12:38 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 10 Apr 2023 22:03:56 GMT, Leonid Mesnik wrote: >> src/hotspot/share/prims/jvmtiEventController.cpp line 727: >> >>> 725: JvmtiExport::set_should_post_on_exceptions((any_env_thread_enabled & SHOULD_POST_ON_EXCEPTIONS_BITS) != 0); >>> 726: >>> 727: JvmtiExport::_should_post_allocation_notifications = JvmtiExport::should_post_vm_object_alloc(); >> >> I'm not sure why this flag is needed. It looks like a dup of `JvmtiExport::should_post_vm_object_alloc()`. >> Can we just replace it with `JvmtiExport::should_post_vm_object_alloc()`? > > I don't think we could replace it by function. Also, I think that it is needed later to add SampledObjectAlloc event here. It should consider VM internal object allocations along with all allocations. I was thinking about an offset. But I've got your plan to use this flag for both `VMObjectAlloc` and `SampledObjectAlloc` event types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13312#discussion_r1162181763 From lmesnik at openjdk.org Tue Apr 11 01:05:49 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 11 Apr 2023 01:05:49 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fixed after Sergey's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13312/files - new: https://git.openjdk.org/jdk/pull/13312/files/9d13e058..82aa5d93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13312&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13312&range=00-01 Stats: 23 lines in 9 files changed: 0 ins; 4 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/13312.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13312/head:pull/13312 PR: https://git.openjdk.org/jdk/pull/13312 From lmesnik at openjdk.org Tue Apr 11 01:05:51 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 11 Apr 2023 01:05:51 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 22:20:43 GMT, Leonid Mesnik wrote: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. I have simplified #if #else #endif and renamed functions and variables. Also, changed _should_notify_object_alloc default value to false. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13312#issuecomment-1502535015 From dholmes at openjdk.org Tue Apr 11 03:09:36 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 11 Apr 2023 03:09:36 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v5] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Fri, 7 Apr 2023 04:51:36 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - added heap scanning to report unmounted vthreads; >> - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; >> - common code to handle stack frames are moved into separate class; > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Use atomic for synchronization Seems to me the bug report is asking for unmounted virtual threads to be considered roots - but virtual threads are deliberately not roots. Any unmounted virtual thread should be reachable from either the scheduler or whatever object the VT is parked on, so if they are not showing up then perhaps the wrong reference is being followed. If there is a bug/missing-functionality in the FollowReferences implementation then fixing it is of course fine, but that is not what the bug report seems to be about. ??? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1502624570 From sspitsyn at openjdk.org Tue Apr 11 04:16:37 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 11 Apr 2023 04:16:37 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 01:05:49 GMT, Leonid Mesnik wrote: >> Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed after Sergey's comments Thank you for the update& Looks good now. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13312#pullrequestreview-1378422184 From kvn at openjdk.org Tue Apr 11 04:32:35 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Apr 2023 04:32:35 GMT Subject: RFR: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 01:05:49 GMT, Leonid Mesnik wrote: >> Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed after Sergey's comments Update looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13312#pullrequestreview-1378431164 From alanb at openjdk.org Tue Apr 11 05:53:57 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 11 Apr 2023 05:53:57 GMT Subject: Integrated: 8304919: Implementation of Virtual Threads In-Reply-To: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> References: <5i_MXEpA1DKDXRb40oNKuNkO8Lx5cxVGAi2cd0xQB8s=.f7c43207-d81a-4a75-89d2-a2877269d5f9@github.com> Message-ID: On Tue, 28 Mar 2023 07:28:01 GMT, Alan Bateman wrote: > JEP 444 proposes to make virtual threads a permanent feature in Java 21. The APIs that were preview APIs in Java 19/20 are changed to permanent and their `@since`/equivalent are changed to 21 (as per the guidance in JEP 12). The JNI and JVMTI versions are bumped as this is the first change in 21 to need the new version number. A lot of tests are updated to drop `@enablePreview` and --enable-preview. > > There is one API change from Java 19/20, the preview API Thread.Builder.allowSetThreadLocals(boolean) is dropped. This requires an update to the JVMTI GetThreadInfo implementation to read the TCCL consistently. > > In addition, there are a small number of implementation changes to sync up from the loom fibers branch: > > - A number of stack frames are `@Hidden` to reduce noise in the stack traces. This exposed a few issues with the stack walker code. More specifically, the cases where end of a continuation falls precisely at the end of the batch, or where the remaining frames are hidden, weren't handled correctly. > - The code to emit the JFR jdk.ThreadSleepEvent is refactored so it's in Thread rather than in two classes. > - A few robustness improvements for OOME and SOE. There is more to do here, for future PRs. > - New system property to print a stack trace when a virtual thread sets its own value of a TL. > - ThreadPerTaskExecutor is changed to use FutureTask. > > Testing: tier1-6. This pull request has now been integrated. Changeset: 2586f361 Author: Alan Bateman URL: https://git.openjdk.org/jdk/commit/2586f36120317cd206464b1e79d3906f711487cb Stats: 2275 lines in 206 files changed: 903 ins; 866 del; 506 mod 8304919: Implementation of Virtual Threads Reviewed-by: lmesnik, cjplummer, psandoz, mchung, sspitsyn, jpai ------------- PR: https://git.openjdk.org/jdk/pull/13203 From aph-open at littlepinkcloud.com Tue Apr 11 08:59:39 2023 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Tue, 11 Apr 2023 09:59:39 +0100 Subject: [Investigation] Considering using a hashtable to store the signature handlers In-Reply-To: References: Message-ID: <7d49663e-6a97-c1ff-e41e-cab3c04c3f26@littlepinkcloud.com> On 4/9/23 16:02, Guoxiong Li wrote: > What is your opinion? > Any ideas are appreciated. I would measure the time taken for the operations of insertion and lookup over a realistic range. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From qamai at openjdk.org Tue Apr 11 09:38:50 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 11 Apr 2023 09:38:50 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: On Mon, 10 Apr 2023 15:16:59 GMT, Jatin Bhateja wrote: >> Yes I think it is a drawback of this approach, however currently we do not support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 seems to be a special case in this regard. This species of float and double may also be less common in the usage of Vector API since it is larger than SPECIES_PREFERRED. > > Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector algorithms intercepting both integral and floating point vectors. > > FTR, we see a perf regression with Float256 based micro now on AVX=1 targets, > > > public static short micro() { > VectorShuffle iota = FloatVector.SPECIES_256.iotaShuffle(0, 1, true); > return iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1); > } > > CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef > CompileCommand: compileonly shufflef.micro bool compileonly = true > ** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0 > ** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0 > @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) > @ 34 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 54 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) > @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) > [time] 386ms [res]3392 > CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/ > CPROMPT>export PATH=$JAVA_HOME/bin:$PATH > CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef > CompileCommand: compileonly shufflef.micro bool compileonly = true > WARNING: Using incubator modules: jdk.incubator.vector > @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) > @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) > @ 17 jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) > @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) > [time] 7ms [res]3392 @jatin-bhateja Since `Float256Shuffle` is represented as a 256-bit int vector, which is not supported by AVX1, the compiled code falls back to Java implementation, which explains the regression. However, having a `VectorShuffle` but not for `Vector::rearrange` is not really useful, and the code snippet is similar to `ShortVector.SPECIES_128.iotaShuffle(0, 1, true).toVector().reinterpretAsShorts().lane(1)`. As a result, I think having some regressions in edge cases of AVX1 is acceptable in contrast with the improvement in all other operations on all platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1162555106 From jsjolen at openjdk.org Tue Apr 11 09:42:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 09:42:05 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v2] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/riscv. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12324/files - new: https://git.openjdk.org/jdk/pull/12324/files/cf24946e..5d2786ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12324&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12324&range=00-01 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12324/head:pull/12324 PR: https://git.openjdk.org/jdk/pull/12324 From jsjolen at openjdk.org Tue Apr 11 09:42:07 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 09:42:07 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v2] In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 03:38:55 GMT, Fei Yang wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes > > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 1616: > >> 1614: beqz(t0, L); >> 1615: stop("InterpreterMacroAssembler::call_VM_leaf_base:" >> 1616: " last_sp != null"); > > Maybe: " last_sp isn't null" ? Agreed. > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 1643: > >> 1641: beqz(t0, L); >> 1642: stop("InterpreterMacroAssembler::call_VM_base:" >> 1643: " last_sp != null"); > > Similar here. Maybe: " last_sp isn't null" ? Agreed. > src/hotspot/cpu/riscv/interpreterRT_riscv.cpp line 230: > >> 228: virtual void pass_object() { >> 229: intptr_t* addr = single_slot_addr(); >> 230: intptr_t value = *addr == 0 ? nullptr : (intptr_t)addr; > > PS: I got compile errors when doing a native build with GCC-11.3.0: > > 474 /home/fyang/openjdk-jdk/src/hotspot/cpu/riscv/interpreterRT_riscv.cpp: In member function 'virtual void SlowSignatureHandler::pass_object()': > 475 /home/fyang/openjdk-jdk/src/hotspot/cpu/riscv/interpreterRT_riscv.cpp:230:33: error: operands to '?:' have different types 'std::nullptr_t' and 'intptr_t' {aka 'long int'} > 476 230 | intptr_t value = *addr == 0 ? nullptr : (intptr_t)addr; > 477 | ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ > > > I think we should change the "nullptr" here into "(intptr_t)nullptr". Yeah, that's correct. Fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12324#discussion_r1162556256 PR Review Comment: https://git.openjdk.org/jdk/pull/12324#discussion_r1162556345 PR Review Comment: https://git.openjdk.org/jdk/pull/12324#discussion_r1162557096 From jsjolen at openjdk.org Tue Apr 11 09:42:07 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 09:42:07 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v2] In-Reply-To: <9daAIvq4DDVLuvsw9JjFK5Ap3KGkCkuoAEoYuVhbBto=.f8a06117-49e4-4a75-80aa-e07338388581@github.com> References: <9daAIvq4DDVLuvsw9JjFK5Ap3KGkCkuoAEoYuVhbBto=.f8a06117-49e4-4a75-80aa-e07338388581@github.com> Message-ID: On Wed, 29 Mar 2023 02:35:15 GMT, David Holmes wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1836: > >> 1834: __ bne(t1, scratch_src_klass, L_failed); >> 1835: >> 1836: // if [src->is_Array() != null] then return -1 > > nullptr for code fragment Hi, I changed all of these to "if X is/isn't null" in the previous instances and in this one. Does that sound OK to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12324#discussion_r1162555612 From jsjolen at openjdk.org Tue Apr 11 09:49:03 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 09:49:03 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v3] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/riscv. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge remote-tracking branch 'origin/master' into JDK-8301496 - Fixes - Merge remote-tracking branch 'origin/master' into JDK-8301496 - Fixes - Merge remote-tracking branch 'origin/master' into JDK-8301496 - Replace NULL with nullptr in cpu/riscv ------------- Changes: https://git.openjdk.org/jdk/pull/12324/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12324&range=02 Stats: 573 lines in 45 files changed: 0 ins; 0 del; 573 mod Patch: https://git.openjdk.org/jdk/pull/12324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12324/head:pull/12324 PR: https://git.openjdk.org/jdk/pull/12324 From jsjolen at openjdk.org Tue Apr 11 10:11:45 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 10:11:45 GMT Subject: RFR: JDK-8301497: Replace NULL with nullptr in cpu/s390 Message-ID: Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/s390. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. Here are some typical things to look out for: 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. An example of this: ```c++ // This function returns null void* ret_null(); // This function returns true if *x == nullptr bool is_nullptr(void** x); Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. Thanks! ------------- Commit messages: - Merge remote-tracking branch 'origin/master' into JDK-8301497 - Fixes - Replace NULL with nullptr in cpu/s390 Changes: https://git.openjdk.org/jdk/pull/12325/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12325&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301497 Stats: 452 lines in 44 files changed: 0 ins; 0 del; 452 mod Patch: https://git.openjdk.org/jdk/pull/12325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12325/head:pull/12325 PR: https://git.openjdk.org/jdk/pull/12325 From jsjolen at openjdk.org Tue Apr 11 10:12:09 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 10:12:09 GMT Subject: RFR: JDK-8301497: Replace NULL with nullptr in cpu/s390 In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:40:09 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/s390. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Reviewed and found a decent chunk of things to fix. Don't close yet. src/hotspot/cpu/s390/abstractInterpreter_s390.cpp line 124: > 122: // Parameters: > 123: // > 124: // interpreter_frame != null: nullptr src/hotspot/cpu/s390/frame_s390.cpp line 114: > 112: > 113: // At this point, there still is a chance that fp_safe is false. > 114: // In particular, (fp == null) might be true. So let's check and nullptr src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp line 407: > 405: __ verify_oop(value, FILE_AND_LINE); > 406: DecoratorSet decorators = IN_NATIVE | ON_PHANTOM_OOP_REF; > 407: g1_write_barrier_pre(masm, decorators, (const Address*)nullptr, value, noreg, tmp1, tmp2, true); unnecessary cast? src/hotspot/cpu/s390/interp_masm_s390.cpp line 278: > 276: Register tmp = Z_ARG3; > 277: load_and_test_long(jvmti_thread_state, Address(Z_thread, JavaThread::jvmti_thread_state_offset())); > 278: z_bre(L); // if (thread->jvmti_thread_state() == null) exit; nullptr src/hotspot/cpu/s390/interp_masm_s390.cpp line 984: > 982: // } else if (THREAD->is_lock_owned((address)displaced_header)) > 983: // // Simple recursive case. > 984: // monitor->lock()->set_displaced_header(null); nullptr src/hotspot/cpu/s390/interp_masm_s390.cpp line 1029: > 1027: // } else if (THREAD->is_lock_owned((address)displaced_header)) > 1028: // // Simple recursive case. > 1029: // monitor->lock()->set_displaced_header(null); nullptr src/hotspot/cpu/s390/interp_masm_s390.cpp line 1085: > 1083: // if ((displaced_header = monitor->displaced_header()) == null) { > 1084: // // Recursive unlock. Mark the monitor unlocked by setting the object field to null. > 1085: // monitor->set_obj(null); nullptr src/hotspot/cpu/s390/interp_masm_s390.cpp line 1088: > 1086: // } else if (Atomic::cmpxchg(obj->mark_addr(), monitor, displaced_header) == monitor) { > 1087: // // We swapped the unlocked mark in displaced_header into the object's mark word. > 1088: // monitor->set_obj(null); nullptr src/hotspot/cpu/s390/interp_masm_s390.cpp line 1111: > 1109: // if ((displaced_header = monitor->displaced_header()) == null) { > 1110: // // Recursive unlock. Mark the monitor unlocked by setting the object field to null. > 1111: // monitor->set_obj(null); nullptr src/hotspot/cpu/s390/interp_masm_s390.cpp line 1123: > 1121: // } else if (Atomic::cmpxchg(obj->mark_addr(), monitor, displaced_header) == monitor) { > 1122: // // We swapped the unlocked mark in displaced_header into the object's mark word. > 1123: // monitor->set_obj(null); nullptr src/hotspot/cpu/s390/interp_masm_s390.cpp line 1473: > 1471: // // degenerate decision tree, rooted at row[2] > 1472: // if (row[2].rec == rec) { row[2].incr(); goto done; } > 1473: // if (row[2].rec != null) { count.incr(); goto done; } // overflow nullptr src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3188: > 3186: Register monitor_tagged = displacedHeader; // Tagged with markWord::monitor_value. > 3187: bind(object_has_monitor); > 3188: // The object's monitor m is unlocked iff m->owner == null, nullptr src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3391: > 3389: } else { > 3390: if (needs_explicit_null_check((intptr_t)offset)) { > 3391: // Provoke OS null exception if reg = null by reg is src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3397: > 3395: // else > 3396: // Nothing to do, (later) access of M[reg + offset] > 3397: // will provoke OS null exception if reg = null. reg is src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3503: > 3501: // This function calculates the size of the code generated by > 3502: // decode_klass_not_null(register dst, Register src) > 3503: // when (Universe::heap() != null). Hence, if the instructions nullptr src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3669: > 3667: // Rbase - Base address of cKlass in memory. > 3668: // maybenull - True if Rop1 possibly is a null. > 3669: void MacroAssembler::compare_klass_ptr(Register Rop1, int64_t disp, Register Rbase, bool maybenullptr) { `maybe_null` is the best name here, imho. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3803: > 3801: // maybenull - True if Rop1 possibly is a null. > 3802: // maybenulltarget - Branch target for Rop1 == null, if flow control shall NOT continue with compare instruction. > 3803: void MacroAssembler::compare_heap_oop(Register Rop1, Address mem, bool maybenullptr) { make sure these are consistent also src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3917: > 3915: // only32bitValid is set, if later code only uses the lower 32 bits. In this > 3916: // case we must not fix the upper 32 bits. > 3917: void MacroAssembler::oop_encoder(Register Rdst, Register Rsrc, bool maybenullptr, Consistency src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3952: > 3950: assert_different_registers(Rdst, Z_R1); > 3951: assert_different_registers(Rsrc, Rbase); > 3952: if (maybenullptr) { Consistency src/hotspot/cpu/s390/macroAssembler_s390.cpp line 4053: > 4051: // - keep Rdst and Rsrc distinct from Rbase. Rdst == Rsrc is ok for performance. > 4052: // - avoid Z_R1 for Rdst if Rdst == Rbase. > 4053: void MacroAssembler::oop_decoder(Register Rdst, Register Rsrc, bool maybenullptr, Register Rbase, int pow2_offset) { Consistency src/hotspot/cpu/s390/macroAssembler_s390.cpp line 4077: > 4075: > 4076: // Rsrc contains a narrow oop. Thus we are sure the leftmost bits will never be set. > 4077: if (maybenullptr) { // null ptr must be preserved! Consistency src/hotspot/cpu/s390/macroAssembler_s390.cpp line 4139: > 4137: // Scale oop and check for null. > 4138: // Rsrc contains a narrow oop. Thus we are sure the leftmost bits will never be set. > 4139: if (maybenullptr) { // null ptr must be preserved! Consistency src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5529: > 5527: // Generated code must not undergo any transformation, e.g. ShortenBranches, to be safe. > 5528: address MacroAssembler::stop_chain(address reentry, int type, const char* msg, int id, bool allow_relocation) { > 5529: BLOCK_COMMENT(err_msg("stop_chain(%s,%s): %s {", reentry==null?"init":"cont", allow_relocation?"reloc ":"static", msg)); Glitched patch src/hotspot/cpu/s390/macroAssembler_s390.hpp line 774: > 772: // This function calculates the size of the code generated by > 773: // decode_klass_not_null(register dst) > 774: // when (Universe::heap() != null). Hence, if the instructions nullptr src/hotspot/cpu/s390/macroAssembler_s390.hpp line 785: > 783: int get_oop_base_complement(Register Rbase, uint64_t oop_base); > 784: void compare_heap_oop(Register Rop1, Address mem, bool maybenullptr); > 785: void compare_klass_ptr(Register Rop1, int64_t disp, Register Rbase, bool maybenullptr); Consistency src/hotspot/cpu/s390/macroAssembler_s390.hpp line 808: > 806: Register Rbase = Z_R1, int pow2_offset = -1, bool only32bitValid = false); > 807: void oop_decoder(Register Rdst, Register Rsrc, bool maybenullptr, > 808: Register Rbase = Z_R1, int pow2_offset = -1); Consistency src/hotspot/cpu/s390/nativeInst_s390.cpp line 388: > 386: ShouldNotReachHere(); > 387: #endif > 388: return *(intptr_t *)nullptr; Unnecessary cast? Also, this is de-referencing a null pointer, so we might have to do this differently after the conversion. src/hotspot/cpu/s390/runtime_s390.cpp line 118: > 116: __ z_lgr(Z_SP, saved_sp); > 117: > 118: // [Z_RET]!=null was possible in hotspot5 but not in sapjvm6. nullptr src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 1046: > 1044: __ add2reg(rHandle, oop_slot_offset, Z_SP); > 1045: > 1046: // If Oop == null, use a null handle. is src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 1327: > 1325: in_ByteSize(-1), > 1326: in_ByteSize(-1), > 1327: (OopMapSet *) nullptr); Unnecessary cast? src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 2079: > 2077: __ load_and_test_long(Z_R0_scratch, method_(code)); > 2078: __ z_lg(ientry, method_(interpreter_entry)); // Preload interpreter entry (also if patching). > 2079: __ z_brne(patch_callsite); // Patch required if code != null (compiled target exists). nullptr src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp line 1133: > 1131: // Initialize z_ijava_state->mdx. > 1132: Register Rmdp = Z_bcp; > 1133: // native_call: assert that mdo == null nullptr src/hotspot/cpu/s390/templateTable_s390.cpp line 4015: > 4013: __ bind(done); > 4014: // tos = 0: obj == null or obj is not an instanceof the specified klass > 4015: // tos = 1: obj != null and obj is an instanceof the specified klass nullptr src/hotspot/cpu/s390/templateTable_s390.cpp line 4143: > 4141: } > 4142: > 4143: // Rfree_slot != null -> found one nullptr src/hotspot/cpu/s390/vtableStubs_s390.cpp line 85: > 83: > 84: const Register rcvr_klass = Z_R1_scratch; > 85: address npe_addr = __ pc(); // npe == null ptr exception This reads a bit strange. src/hotspot/cpu/s390/vtableStubs_s390.cpp line 198: > 196: // Get receiver klass. > 197: // Must do an explicit check if offset too large or implicit checks are disabled. > 198: address npe_addr = __ pc(); // npe == null ptr exception This also reads a bit strange ------------- PR Review: https://git.openjdk.org/jdk/pull/12325#pullrequestreview-1295242200 PR Comment: https://git.openjdk.org/jdk/pull/12325#issuecomment-1477543951 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104199478 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104235222 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104235828 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104236459 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104236650 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104236732 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104236820 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104236901 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104237008 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104237153 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104237391 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104238486 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104238696 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104238845 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104239028 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104240066 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104240637 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104240952 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104241157 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104241372 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104241453 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104241530 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104241920 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104242237 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104242375 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104242548 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104245997 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104247150 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104247507 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104247736 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104247948 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104248993 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104249718 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104249975 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104250871 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1104251134 From jsjolen at openjdk.org Tue Apr 11 10:12:09 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 10:12:09 GMT Subject: RFR: JDK-8301497: Replace NULL with nullptr in cpu/s390 In-Reply-To: References: Message-ID: On Mon, 13 Feb 2023 10:12:25 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/s390. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > src/hotspot/cpu/s390/nativeInst_s390.cpp line 388: > >> 386: ShouldNotReachHere(); >> 387: #endif >> 388: return *(intptr_t *)nullptr; > > Unnecessary cast? Also, this is de-referencing a null pointer, so we might have to do this differently after the conversion. Code indicates that we want to crash here (sigsegv and shouldnotreachhere above). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1162578083 From lucy at openjdk.org Tue Apr 11 10:12:09 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 11 Apr 2023 10:12:09 GMT Subject: RFR: JDK-8301497: Replace NULL with nullptr in cpu/s390 In-Reply-To: References: Message-ID: <3N6PLKkvkQG_PdfjmoOUgn4uK2L4i_74YOfvIGhrTj8=.67bfa18a-23bd-4925-ba6d-580d7c358380@github.com> On Mon, 13 Feb 2023 10:16:01 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/s390. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > src/hotspot/cpu/s390/vtableStubs_s390.cpp line 85: > >> 83: >> 84: const Register rcvr_klass = Z_R1_scratch; >> 85: address npe_addr = __ pc(); // npe == null ptr exception > > This reads a bit strange. Read this as "npe means null pointer exception" or however you would like to spell it. > src/hotspot/cpu/s390/vtableStubs_s390.cpp line 198: > >> 196: // Get receiver klass. >> 197: // Must do an explicit check if offset too large or implicit checks are disabled. >> 198: address npe_addr = __ pc(); // npe == null ptr exception > > This also reads a bit strange Read this as "npe means null pointer exception" or however you would like to spell it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1105994804 PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1105995129 From jsjolen at openjdk.org Tue Apr 11 10:12:09 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 10:12:09 GMT Subject: RFR: JDK-8301497: Replace NULL with nullptr in cpu/s390 In-Reply-To: <3N6PLKkvkQG_PdfjmoOUgn4uK2L4i_74YOfvIGhrTj8=.67bfa18a-23bd-4925-ba6d-580d7c358380@github.com> References: <3N6PLKkvkQG_PdfjmoOUgn4uK2L4i_74YOfvIGhrTj8=.67bfa18a-23bd-4925-ba6d-580d7c358380@github.com> Message-ID: On Tue, 14 Feb 2023 15:38:04 GMT, Lutz Schmidt wrote: >> src/hotspot/cpu/s390/vtableStubs_s390.cpp line 85: >> >>> 83: >>> 84: const Register rcvr_klass = Z_R1_scratch; >>> 85: address npe_addr = __ pc(); // npe == null ptr exception >> >> This reads a bit strange. > > Read this as "npe means null pointer exception" or however you would like to spell it. So it's saying "NPE is short for null pointer exception"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12325#discussion_r1162582116 From fyang at openjdk.org Tue Apr 11 10:51:03 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Apr 2023 10:51:03 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v3] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 09:49:03 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/riscv. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Replace NULL with nullptr in cpu/riscv Update change looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12324#pullrequestreview-1378969328 From stefank at openjdk.org Tue Apr 11 11:39:17 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Apr 2023 11:39:17 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 11:59:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > RISCV update Changes requested by stefank (Reviewer). src/hotspot/share/runtime/lockStack.inline.hpp line 111: > 109: int end = to_index(_top); > 110: for (int i = end - 1; i >= 0; i--) { > 111: if (NativeAccess<>::oop_load(&_base[i]) == o) { The use of NativeAccess here will break Generational ZGC. For other GCs it's just a redundant GC barrier. The actual GC barrier for the oops in the thread header is the start_processing() call. I was going to propose that you changed this to a plain load (as opposed to using RawAccess), but @fisk pointed out that it looks like this code is used from one thread looking into the data structures of another thread, which would make such a load potentially racing. And that makes us also question the plain load of `_top`. Is there anything that ensures that these are not racy loads? ------------- PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1379038780 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1162681920 From rkennke at openjdk.org Tue Apr 11 11:51:21 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 11 Apr 2023 11:51:21 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 11:36:10 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> RISCV update > > src/hotspot/share/runtime/lockStack.inline.hpp line 111: > >> 109: int end = to_index(_top); >> 110: for (int i = end - 1; i >= 0; i--) { >> 111: if (NativeAccess<>::oop_load(&_base[i]) == o) { > > The use of NativeAccess here will break Generational ZGC. For other GCs it's just a redundant GC barrier. The actual GC barrier for the oops in the thread header is the start_processing() call. > > I was going to propose that you changed this to a plain load (as opposed to using RawAccess), but @fisk pointed out that it looks like this code is used from one thread looking into the data structures of another thread, which would make such a load potentially racing. And that makes us also question the plain load of `_top`. Is there anything that ensures that these are not racy loads? The NativeAccess is a left-over from an earlier attempt, and yes I think the start_processing() is the actual barrier. There is a single call-path where we inspect another thread's lock-stack outside of a safepoint (from management/JMX code). We had some arguments back and forth with David about that (somewhere up in this PR) and the conclusion so far is that yes, it is racy, but it doesn't seem to be a problem. We might be getting wrong results in the sense that the other thread could change the state of locking in the moment right after we inspect it, but this doesn't look like a correctness problem in the code that's calling it and the problem is pre-existing with current stack-locking, too. See jmm_GetThreadInfo() in management.cpp around lines 1129ff. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1162692775 From stefank at openjdk.org Tue Apr 11 12:18:34 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Apr 2023 12:18:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 11:47:46 GMT, Roman Kennke wrote: >> src/hotspot/share/runtime/lockStack.inline.hpp line 111: >> >>> 109: int end = to_index(_top); >>> 110: for (int i = end - 1; i >= 0; i--) { >>> 111: if (NativeAccess<>::oop_load(&_base[i]) == o) { >> >> The use of NativeAccess here will break Generational ZGC. For other GCs it's just a redundant GC barrier. The actual GC barrier for the oops in the thread header is the start_processing() call. >> >> I was going to propose that you changed this to a plain load (as opposed to using RawAccess), but @fisk pointed out that it looks like this code is used from one thread looking into the data structures of another thread, which would make such a load potentially racing. And that makes us also question the plain load of `_top`. Is there anything that ensures that these are not racy loads? > > The NativeAccess is a left-over from an earlier attempt, and yes I think the start_processing() is the actual barrier. There is a single call-path where we inspect another thread's lock-stack outside of a safepoint (from management/JMX code). We had some arguments back and forth with David about that (somewhere up in this PR) and the conclusion so far is that yes, it is racy, but it doesn't seem to be a problem. We might be getting wrong results in the sense that the other thread could change the state of locking in the moment right after we inspect it, but this doesn't look like a correctness problem in the code that's calling it and the problem is pre-existing with current stack-locking, too. See jmm_GetThreadInfo() in management.cpp around lines 1129ff. It looks to me like the code could read racingly read the element just above `_top`, which could contain a stale oop. If the address of the stale oop matches the address of `o` then `contains` would incorrectly return true. Did you consider rewriting the racing code to use thread-local handshakes to remove the race? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1162729676 From jsjolen at openjdk.org Tue Apr 11 12:35:03 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 12:35:03 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v3] In-Reply-To: References: Message-ID: <6LA1Ag5CZXGZh5yOv8Li3vL2ILtDkURtFr1D2Jiqvj0=.085dac59-0d32-45fd-8400-bfc8011b9ae1@github.com> On Tue, 11 Apr 2023 10:47:43 GMT, Fei Yang wrote: >> Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8301496 >> - Fixes >> - Merge remote-tracking branch 'origin/master' into JDK-8301496 >> - Fixes >> - Merge remote-tracking branch 'origin/master' into JDK-8301496 >> - Replace NULL with nullptr in cpu/riscv > > Update change looks good. Thanks. Thanks @RealFYang! Would you mind running the tier1 tests on RISC-V? I don't have access to that architecture. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12324#issuecomment-1503242539 From mgronlun at openjdk.org Tue Apr 11 12:36:43 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 11 Apr 2023 12:36:43 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames Can I please get a second review to close this one out? Thanks Markus ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1503245005 From jsjolen at openjdk.org Tue Apr 11 12:40:32 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 12:40:32 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc [v2] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge remote-tracking branch 'origin/master' into JDK-8301495 - Merge remote-tracking branch 'origin/master' into JDK-8301495 - Revert change in file - Fixes - Merge remote-tracking branch 'origin/master' into JDK-8301495 - reinrich suggestions - Replace NULL with nullptr in cpu/ppc ------------- Changes: https://git.openjdk.org/jdk/pull/12323/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12323&range=01 Stats: 381 lines in 51 files changed: 0 ins; 0 del; 381 mod Patch: https://git.openjdk.org/jdk/pull/12323.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12323/head:pull/12323 PR: https://git.openjdk.org/jdk/pull/12323 From jsjolen at openjdk.org Tue Apr 11 12:40:35 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 12:40:35 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc In-Reply-To: References: Message-ID: <-0VUDu6zWYNwc-3G9KXdPxCmibeMq3uKoUxRdfeHDxM=.5aec8043-76de-4ff2-9a52-8697b10de855@github.com> On Tue, 31 Jan 2023 11:39:48 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! @backwaterred , would you like to check that this builds and tests for PPC :)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12323#issuecomment-1503249174 From jsjolen at openjdk.org Tue Apr 11 12:55:25 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 12:55:25 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v2] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Explicitly cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12321/files - new: https://git.openjdk.org/jdk/pull/12321/files/86aa1878..45ab2f0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12321.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12321/head:pull/12321 PR: https://git.openjdk.org/jdk/pull/12321 From jsjolen at openjdk.org Tue Apr 11 12:55:28 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 12:55:28 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v2] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 20:45:01 GMT, Stuart Monteith wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Explicitly cast > > src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 270: > >> 268: virtual void pass_object() { >> 269: intptr_t* addr = single_slot_addr(); >> 270: intptr_t value = *addr == 0 ? nullptr : (intptr_t)addr; > > This doesn't compile - perhaps replace nullptr with zero? Unless casting it is more appropriate. There's similar code in other patches, I did casting in those. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1162770322 From jsjolen at openjdk.org Tue Apr 11 13:22:40 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 13:22:40 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Fix style - Merge remote-tracking branch 'origin/master' into JDK-8301493 - Explicitly cast - Fixes - Replace NULL with nullptr in cpu/aarch64 ------------- Changes: https://git.openjdk.org/jdk/pull/12321/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=02 Stats: 436 lines in 42 files changed: 0 ins; 0 del; 436 mod Patch: https://git.openjdk.org/jdk/pull/12321.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12321/head:pull/12321 PR: https://git.openjdk.org/jdk/pull/12321 From jsjolen at openjdk.org Tue Apr 11 13:22:43 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 13:22:43 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 In-Reply-To: <-ZV05tb2xNWIBcGc7Nj_TZ6qq3BGrsjlKCT48_GTmQU=.6480f4f9-f1a5-47fa-94d9-51d3968ff711@github.com> References: <-ZV05tb2xNWIBcGc7Nj_TZ6qq3BGrsjlKCT48_GTmQU=.6480f4f9-f1a5-47fa-94d9-51d3968ff711@github.com> Message-ID: On Tue, 21 Mar 2023 13:32:10 GMT, Stuart Monteith wrote: > This looks OK so far. However, is it your intention to also do aarch64.ad? aarch64_ad.m4 and aarch64_vector(.ad|_ad.m4) files look clean. I'm only touching the .cpp/.hpp files in these PRs. The NULL count of all .ad files is only 339, so if I do those it'll all be in one commit (probably). ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1503335894 From jsjolen at openjdk.org Tue Apr 11 13:23:07 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Apr 2023 13:23:07 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v3] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 09:49:03 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/riscv. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Replace NULL with nullptr in cpu/riscv >org.opentest4j.AssertionFailedError: java.lang.ThreadLocal.set not found!!! ==> expected: but was: linux-x86 test failure. Probably false alarm, as these changes shouldn't change the behaviour of linux-x86. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12324#issuecomment-1503340680 From rkennke at openjdk.org Tue Apr 11 13:50:22 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 11 Apr 2023 13:50:22 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> On Tue, 11 Apr 2023 12:15:07 GMT, Stefan Karlsson wrote: >> The NativeAccess is a left-over from an earlier attempt, and yes I think the start_processing() is the actual barrier. There is a single call-path where we inspect another thread's lock-stack outside of a safepoint (from management/JMX code). We had some arguments back and forth with David about that (somewhere up in this PR) and the conclusion so far is that yes, it is racy, but it doesn't seem to be a problem. We might be getting wrong results in the sense that the other thread could change the state of locking in the moment right after we inspect it, but this doesn't look like a correctness problem in the code that's calling it and the problem is pre-existing with current stack-locking, too. See jmm_GetThreadInfo() in management.cpp around lines 1129ff. > > It looks to me like the code could read racingly read the element just above `_top`, which could contain a stale oop. If the address of the stale oop matches the address of `o` then `contains` would incorrectly return true. > > Did you consider rewriting the racing code to use thread-local handshakes to remove the race? Hmm you are right. But still - that problem is pre-existing, right? Consider this code in the current stack-locking implementation. If we don't stop the other thread, we may end up following a stack-pointer from the locked-object, and by the time we get to that stack-address, the thread may already have given up that lock and the stack-address could contain some other random stuff. Re-writing that code to do a handshake would be nice, but I don't think I want to include this in the scope of this PR. If you agree, I would file a separate issue to investigate the problem. As a band-aid, I could add a 'LockingMode == 2' to the if-statement in management.cpp as I already did it earlier: https://github.com/rkennke/jdk/blob/JDK-8291555-v2/src/hotspot/share/services/management.cpp#L1129. This would make all calls into LockStack::contains() happen at a safepoint or only by self-thread, and would certainly make me sleep a little better ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1162848799 From lmesnik at openjdk.org Tue Apr 11 13:58:51 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 11 Apr 2023 13:58:51 GMT Subject: Integrated: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 22:20:43 GMT, Leonid Mesnik wrote: > Updated VM internal object allocation C2 intrinsic to post jvmti events when needed. This pull request has now been integrated. Changeset: 7a5597c3 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/7a5597c34f3b52d8b7c44647bfdcdfac9301b483 Stats: 74 lines in 11 files changed: 65 ins; 5 del; 4 mod 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects Reviewed-by: kvn, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/13312 From stefank at openjdk.org Tue Apr 11 14:06:21 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Apr 2023 14:06:21 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> References: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> Message-ID: <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> On Tue, 11 Apr 2023 13:48:15 GMT, Roman Kennke wrote: >> It looks to me like the code could read racingly read the element just above `_top`, which could contain a stale oop. If the address of the stale oop matches the address of `o` then `contains` would incorrectly return true. >> >> Did you consider rewriting the racing code to use thread-local handshakes to remove the race? > > Hmm you are right. But still - that problem is pre-existing, right? Consider this code in the current stack-locking implementation. If we don't stop the other thread, we may end up following a stack-pointer from the locked-object, and by the time we get to that stack-address, the thread may already have given up that lock and the stack-address could contain some other random stuff. > Re-writing that code to do a handshake would be nice, but I don't think I want to include this in the scope of this PR. If you agree, I would file a separate issue to investigate the problem. As a band-aid, I could add a 'LockingMode == 2' to the if-statement in management.cpp as I already did it earlier: https://github.com/rkennke/jdk/blob/JDK-8291555-v2/src/hotspot/share/services/management.cpp#L1129. This would make all calls into LockStack::contains() happen at a safepoint or only by self-thread, and would certainly make me sleep a little better ;-) OK. Given that I haven't looked at the rest of the patch, I leave it up to you and the Reviewers to figure out what to do about this code. Cheers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1162872406 From dcubed at openjdk.org Tue Apr 11 15:33:31 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 11 Apr 2023 15:33:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> References: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> Message-ID: On Tue, 11 Apr 2023 14:04:17 GMT, Stefan Karlsson wrote: >> Hmm you are right. But still - that problem is pre-existing, right? Consider this code in the current stack-locking implementation. If we don't stop the other thread, we may end up following a stack-pointer from the locked-object, and by the time we get to that stack-address, the thread may already have given up that lock and the stack-address could contain some other random stuff. >> Re-writing that code to do a handshake would be nice, but I don't think I want to include this in the scope of this PR. If you agree, I would file a separate issue to investigate the problem. As a band-aid, I could add a 'LockingMode == 2' to the if-statement in management.cpp as I already did it earlier: https://github.com/rkennke/jdk/blob/JDK-8291555-v2/src/hotspot/share/services/management.cpp#L1129. This would make all calls into LockStack::contains() happen at a safepoint or only by self-thread, and would certainly make me sleep a little better ;-) > > OK. Given that I haven't looked at the rest of the patch, I leave it up to you and the Reviewers to figure out what to do about this code. Cheers. Given that the race with new lightweight locking is virtually the same as the race with legacy stack locking, please do not put back the 'LockingMode == 2' check which would make `jmm_GetThreadInfo()` calls slower with new lightweight locking than with legacy stack locking. Perhaps I'm not understanding the risk of what @stefank means with: It looks to me like the code could read racingly read the element just above _top, which could contain a stale oop. If the address of the stale oop matches the address of o then contains would incorrectly return true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1162994635 From sspitsyn at openjdk.org Tue Apr 11 16:59:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 11 Apr 2023 16:59:48 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 12:33:32 GMT, Markus Gr?nlund wrote: > Can I please get a second review to close this one out? Markus, I'm still working on it and close to finish. I have some questions to ask. In fact, I gave up to prove this refactoring does not break anything. So, we should rely on testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1503774347 From mgronlun at openjdk.org Tue Apr 11 17:02:46 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 11 Apr 2023 17:02:46 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 16:56:49 GMT, Serguei Spitsyn wrote: > > Can I please get a second review to close this one out? > > Markus, I'm still working on it and close to finish. I have some questions to ask. In fact, I gave up to prove this refactoring does not break anything. So, we should rely on testing. No worries, Serguei. Thank you for taking a look, please take your time. I have run tiers 1-6. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1503779085 From iklam at openjdk.org Tue Apr 11 17:42:31 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 11 Apr 2023 17:42:31 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block In-Reply-To: References: Message-ID: On Sat, 8 Apr 2023 07:14:30 GMT, Thomas Stuefe wrote: > This looks like a nice simplification. Will you also combine all mappings at OS level to a single one, so that you only need one mmap call? Now there's a single call to mmap() in FileMapInfo::map_heap_regions_impl_inner(). This reminds me that I should remove the "s" from "heap_regions" in the function names. Will do that in my next commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13284#issuecomment-1503827701 From jbhateja at openjdk.org Tue Apr 11 17:50:45 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 11 Apr 2023 17:50:45 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 17:13:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > special case iotaShuffle Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13093#pullrequestreview-1379800113 From jbhateja at openjdk.org Tue Apr 11 17:50:48 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 11 Apr 2023 17:50:48 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: <92ZNVJBTNzBNw4reI-1HMd5nvWlNdz_-0wfvGvGe5nk=.e601910e-8c04-48f4-a604-cd14e7b75ee0@github.com> On Tue, 11 Apr 2023 09:36:06 GMT, Quan Anh Mai wrote: >> Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector algorithms intercepting both integral and floating point vectors. >> >> FTR, we see a perf regression with Float256 based micro now on AVX=1 targets, >> >> >> public static short micro() { >> VectorShuffle iota = FloatVector.SPECIES_256.iotaShuffle(0, 1, true); >> return iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1); >> } >> >> CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef >> CompileCommand: compileonly shufflef.micro bool compileonly = true >> ** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0 >> ** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0 >> @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) >> @ 34 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 54 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) >> @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) >> @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) >> @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) >> @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) >> [time] 386ms [res]3392 >> CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/ >> CPROMPT>export PATH=$JAVA_HOME/bin:$PATH >> CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef >> CompileCommand: compileonly shufflef.micro bool compileonly = true >> WARNING: Using incubator modules: jdk.incubator.vector >> @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) >> @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) >> @ 17 jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes) (intrinsic) >> @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) >> @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) >> @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) >> [time] 7ms [res]3392 > > @jatin-bhateja Since `Float256Shuffle` is represented as a 256-bit int vector, which is not supported by AVX1, the compiled code falls back to Java implementation, which explains the regression. However, having a `VectorShuffle` but not for `Vector::rearrange` is not really useful, and the code snippet is similar to `ShortVector.SPECIES_128.iotaShuffle(0, 1, true).toVector().reinterpretAsShorts().lane(1)`. As a result, I think having some regressions in edge cases of AVX1 is acceptable in contrast with the improvement in all other operations on all platforms. Agree, this is also fixing less than 32 bit shuffle vectors case, i.e. shuffles involving Long128, Int64 and Float64 will get benefitted on x86. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1163147535 From jvernee at openjdk.org Tue Apr 11 17:51:45 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 11 Apr 2023 17:51:45 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v20] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 10:54:18 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > Per Minborg has updated the pull request incrementally with two additional commits since the last revision: > > - 8305369: Issues in zero-length memory segment javadoc section > - 8305087: MemoryLayout API checks should be more eager test/jdk/java/foreign/TestByteBuffer.java line 317: > 315: > 316: @Test > 317: public void testMappedSegmentAsByteBuffer() throws Throwable { While testing something else, I noticed that this test is failing on Linux/WSL since the WSL 1 kernel does not implement `mincore`. We can simply skip the test in that case, as we already do for other tests. I've submitted a patch here: https://github.com/minborg/jdk/pull/2 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1163148467 From vlivanov at openjdk.org Tue Apr 11 18:03:50 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 11 Apr 2023 18:03:50 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v20] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 10:54:18 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > Per Minborg has updated the pull request incrementally with two additional commits since the last revision: > > - 8305369: Issues in zero-length memory segment javadoc section > - 8305087: MemoryLayout API checks should be more eager Hotspot changes look good. src/hotspot/share/prims/jvm.cpp line 3473: > 3471: > 3472: JVM_LEAF(jboolean, JVM_IsForeignLinkerSupported(void)) > 3473: return ForeignGlobals::has_port() ? JNI_TRUE : JNI_FALSE; On naming: I find `has_port()` confusing. Why don't you simply call VM counterpart `is_foreign_linker_supported`? Alternative ideas: `is_supported()`, `has_native_support()`. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13079#pullrequestreview-1379816547 PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1163158810 From jjg at openjdk.org Tue Apr 11 18:21:56 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Tue, 11 Apr 2023 18:21:56 GMT Subject: RFR: JDK-8305713: DocCommentParser: merge blockContent and inlineContent Message-ID: Please review a cleanup in DocCommentParser to merge blockContent and inlineContent into a single method to parse "rich content" in a doc comment. ------------- Depends on: https://git.openjdk.org/jdk/pull/13362 Commit messages: - JDK-8305713: DocCommentParser: merge blockContent and inlineContent - 8272119: Typo in JDK documentation (a -> an) - 8305461: [vectorapi] Add VectorMask::xor - 8305608: Change VMConnection to use "test.class.path"instead of "test.classes" - 8274166: Some CDS tests ignore -Dtest.cds.runtime.options - 8304745: Lazily initialize byte[] in java.io.BufferedInputStream - 8267140: Support closing the HttpClient by making it auto-closable - 8269843: typo in LinkedHashMap::removeEldestEntry spec - 8305480: test/hotspot/jtreg/runtime/NMT/VirtualAllocCommitMerge.java failing on 32 bit arm - 8305607: Remove some unused test parameters in com/sun/jdi tests - ... and 6 more: https://git.openjdk.org/jdk/compare/44f33ad1...b8b43eae Changes: https://git.openjdk.org/jdk/pull/13431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13431&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305713 Stats: 5934 lines in 119 files changed: 5071 ins; 480 del; 383 mod Patch: https://git.openjdk.org/jdk/pull/13431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13431/head:pull/13431 PR: https://git.openjdk.org/jdk/pull/13431 From jjg at openjdk.org Tue Apr 11 18:35:12 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Tue, 11 Apr 2023 18:35:12 GMT Subject: RFR: JDK-8305713: DocCommentParser: merge blockContent and inlineContent [v2] In-Reply-To: References: Message-ID: > Please review a cleanup in DocCommentParser to merge blockContent and inlineContent into a single method to parse "rich content" in a doc comment. Jonathan Gibbons has updated the pull request incrementally with 42 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8305713.dcp-content - 8305809: (fs) Review obsolete Linux kernel dependency on os.version (Unix kernel 2.6.39) Reviewed-by: rriggs, alanb - 8294806: jpackaged-app ignores splash screen from jar file Reviewed-by: almatvee - 8305368: G1 remset chunk claiming may use relaxed memory ordering Reviewed-by: ayang, iwalulya - 8305370: Inconsistent use of for_young_only_phase parameter in G1 predictions Reviewed-by: iwalulya, kbarrett - 8305663: Wrong iteration order of pause array in g1MMUTracker Reviewed-by: ayang, tschatzl - 8305761: Resolve multiple definition of 'jvm' when statically linking with JDK native libraries Reviewed-by: alanb, kevinw - 8305419: JDK-8301995 broke building libgraal Reviewed-by: matsaave, dnsimon, thartmann - 8302696: Revert API signature changes made in JDK-8285504 and JDK-8285263 Reviewed-by: mullan - 8304738: UnregisteredClassesTable_lock never created Reviewed-by: iklam, jcking, dholmes - ... and 32 more: https://git.openjdk.org/jdk/compare/b8b43eae...ab56c463 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13431/files - new: https://git.openjdk.org/jdk/pull/13431/files/b8b43eae..ab56c463 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13431&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13431&range=00-01 Stats: 9487 lines in 384 files changed: 2654 ins; 5914 del; 919 mod Patch: https://git.openjdk.org/jdk/pull/13431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13431/head:pull/13431 PR: https://git.openjdk.org/jdk/pull/13431 From jjg at openjdk.org Tue Apr 11 18:39:53 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Tue, 11 Apr 2023 18:39:53 GMT Subject: RFR: JDK-8305713: DocCommentParser: merge blockContent and inlineContent [v3] In-Reply-To: References: Message-ID: > Please review a cleanup in DocCommentParser to merge blockContent and inlineContent into a single method to parse "rich content" in a doc comment. Jonathan Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 59 commits: - Merge branch 'pr/13362' into pr/13362 - Merge remote-tracking branch 'upstream/master' into 8305713.dcp-content - 8305809: (fs) Review obsolete Linux kernel dependency on os.version (Unix kernel 2.6.39) Reviewed-by: rriggs, alanb - 8294806: jpackaged-app ignores splash screen from jar file Reviewed-by: almatvee - 8305368: G1 remset chunk claiming may use relaxed memory ordering Reviewed-by: ayang, iwalulya - 8305370: Inconsistent use of for_young_only_phase parameter in G1 predictions Reviewed-by: iwalulya, kbarrett - 8305663: Wrong iteration order of pause array in g1MMUTracker Reviewed-by: ayang, tschatzl - 8305761: Resolve multiple definition of 'jvm' when statically linking with JDK native libraries Reviewed-by: alanb, kevinw - 8305419: JDK-8301995 broke building libgraal Reviewed-by: matsaave, dnsimon, thartmann - 8302696: Revert API signature changes made in JDK-8285504 and JDK-8285263 Reviewed-by: mullan - ... and 49 more: https://git.openjdk.org/jdk/compare/5501bf21...45ee028b ------------- Changes: https://git.openjdk.org/jdk/pull/13431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13431&range=02 Stats: 15292 lines in 500 files changed: 7710 ins; 6358 del; 1224 mod Patch: https://git.openjdk.org/jdk/pull/13431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13431/head:pull/13431 PR: https://git.openjdk.org/jdk/pull/13431 From amenkov at openjdk.org Tue Apr 11 18:56:37 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 11 Apr 2023 18:56:37 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v5] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Tue, 11 Apr 2023 03:07:04 GMT, David Holmes wrote: > Seems to me the bug report is asking for unmounted virtual threads to be considered roots - but virtual threads are deliberately not roots. Any unmounted virtual thread should be reachable from either the scheduler or whatever object the VT is parked on, so if they are not showing up then perhaps the wrong reference is being followed. If there is a bug/missing-functionality in the FollowReferences implementation then fixing it is of course fine, but that is not what the bug report seems to be about. ??? The bug is about objects referenced only from stack of unmounted VT are not reported by FollowReferences. So we need to detect stackChunk object and report references from them. Next question - how to report them (jvmtiHeapReferenceKind). Stack locals for java threads are reported as JVMTI_HEAP_REFERENCE_STACK_LOCAL and this looks appropriate kind for this references. For JVMTI_HEAP_REFERENCE_STACK_LOCAL kind reference_info should contain pointer to jvmtiHeapReferenceInfoStackLocal structure, which contain info about thread (thread_tag, thread_id). It would be strange if we report stack locals from a thread without reporting thread itself (reference with JVMTI_HEAP_REFERENCE_THREAD kind), so we need to detect corresponding VirtualThread and report it. So we implicitly consider VT and their stack locals as roots and this made me think we need to report all of them. But as you mentioned VTs are deliberately not roots, so maybe we don't need to detect all of them and reporting only objects we found by following references is enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1503929711 From vlivanov at openjdk.org Tue Apr 11 19:01:48 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 11 Apr 2023 19:01:48 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 17:13:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > special case iotaShuffle Nice refactoring! Happy to see so much code gone. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13093#pullrequestreview-1379896647 From vlivanov at openjdk.org Tue Apr 11 19:06:38 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 11 Apr 2023 19:06:38 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 13:46:12 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ShortVector.java line 2295: > 2293: // to be performant > 2294: @ForceInline > 2295: public ShortVector apply(ShortVector v1, ShortVector v2, int o) { Have you considered matching the corresponding IR during GVN to produce VectorSlice nodes rather than going through VM intrinsic? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1163216924 From dlong at openjdk.org Tue Apr 11 19:39:20 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Apr 2023 19:39:20 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> Message-ID: On Tue, 11 Apr 2023 15:29:16 GMT, Daniel D. Daugherty wrote: >> OK. Given that I haven't looked at the rest of the patch, I leave it up to you and the Reviewers to figure out what to do about this code. Cheers. > > Given that the race with new lightweight locking is virtually the same as the race > with legacy stack locking, please do not put back the 'LockingMode == 2' check > which would make `jmm_GetThreadInfo()` calls slower with new lightweight locking > than with legacy stack locking. > > Perhaps I'm not understanding the risk of what @stefank means with: > > It looks to me like the code could read racingly read the element just above _top, > which could contain a stale oop. If the address of the stale oop matches the > address of o then contains would incorrectly return true. The `_base` array is only initialized to nullptr in debug builds. I don't see a release barrier in LockStack::push between the update to _base[] and the update to _top, nor a corresponding acquire barrier when reading. Doesn't this mean it is possible to racily read an uninitialized junk oop value from _base[], especially on weak memory models? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163243827 From rkennke at openjdk.org Tue Apr 11 20:00:23 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 11 Apr 2023 20:00:23 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> Message-ID: On Tue, 11 Apr 2023 19:35:36 GMT, Dean Long wrote: >> Given that the race with new lightweight locking is virtually the same as the race >> with legacy stack locking, please do not put back the 'LockingMode == 2' check >> which would make `jmm_GetThreadInfo()` calls slower with new lightweight locking >> than with legacy stack locking. >> >> Perhaps I'm not understanding the risk of what @stefank means with: >> >> It looks to me like the code could read racingly read the element just above _top, >> which could contain a stale oop. If the address of the stale oop matches the >> address of o then contains would incorrectly return true. > > The `_base` array is only initialized to nullptr in debug builds. I don't see a release barrier in LockStack::push between the update to _base[] and the update to _top, nor a corresponding acquire barrier when reading. Doesn't this mean it is possible to racily read an uninitialized junk oop value from _base[], especially on weak memory models? Yes. The whole LockStack is not meant to be accessed cross-thread, pretty much like any thread's stack is not meant to be accessed like that (including current stack-locking). So what can go wrong? With the new locking, we could read junk and compare it to the oop that we're testing against and get a wrong result. We're not going to crash though. With the current stack-locking, we would fetch the stack-pointer and check if that address is within the foreign thread's stack. Again, because the other thread is not holding still, we might get a wrong result, but we would not crash. So I guess we need to answer the question whether or not jmm_GetThreadInfo() is ok with returning wrong result and what could be the consequences of this. For example, how important is it that the info about the thread(s) is correct and consistent (e.g. what happens if we report two threads both holding the same lock?), etc. But I don't consider this to be part of this PR. So my proposal is: leave that code as it is, for now (being racy when inspecting foreign threads, but don't crash). Open a new issue to investigate and possibly fix the problem (maybe by safepointing, maybe by handshaking if that is enough, or maybe we find out we don't need to do anything). Add comments in relevant places to point out the problem like you and David suggested earlier. Would that be ok? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163263926 From dlong at openjdk.org Tue Apr 11 20:42:16 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Apr 2023 20:42:16 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> Message-ID: <6vD1PFLLelAVWsCl3YpuPBhd_tuc-xlE3wH_HCp7Lu8=.6b9ed684-f94c-434e-82df-15003ded284d@github.com> On Tue, 11 Apr 2023 19:58:19 GMT, Roman Kennke wrote: >> The `_base` array is only initialized to nullptr in debug builds. I don't see a release barrier in LockStack::push between the update to _base[] and the update to _top, nor a corresponding acquire barrier when reading. Doesn't this mean it is possible to racily read an uninitialized junk oop value from _base[], especially on weak memory models? > > Yes. The whole LockStack is not meant to be accessed cross-thread, pretty much like any thread's stack is not meant to be accessed like that (including current stack-locking). So what can go wrong? > With the new locking, we could read junk and compare it to the oop that we're testing against and get a wrong result. We're not going to crash though. > With the current stack-locking, we would fetch the stack-pointer and check if that address is within the foreign thread's stack. Again, because the other thread is not holding still, we might get a wrong result, but we would not crash. > So I guess we need to answer the question whether or not jmm_GetThreadInfo() is ok with returning wrong result and what could be the consequences of this. For example, how important is it that the info about the thread(s) is correct and consistent (e.g. what happens if we report two threads both holding the same lock?), etc. But I don't consider this to be part of this PR. > > So my proposal is: leave that code as it is, for now (being racy when inspecting foreign threads, but don't crash). Open a new issue to investigate and possibly fix the problem (maybe by safepointing, maybe by handshaking if that is enough, or maybe we find out we don't need to do anything). Add comments in relevant places to point out the problem like you and David suggested earlier. Would that be ok? That seems fine to me, as long as we don't crash. But my understanding is that Generational ZGC will crash if it sees a stale oop. Isn't it possible that the racing read sees junk that looks to Generational ZGC like a stale oop? To avoid this, unused slots may need to be set to nullptr even in product builds. But I'm not a GC expert so maybe there's no problem. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163306288 From rrich at openjdk.org Tue Apr 11 20:52:42 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 11 Apr 2023 20:52:42 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc [v2] In-Reply-To: References: Message-ID: <_IjC5c4-kfZhNi1DRH1sDTgW0fytxYUv4ycmakBeE28=.cbbf578b-4eee-4d6c-9ab2-83b673bb6a5d@github.com> On Tue, 11 Apr 2023 12:40:32 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - Revert change in file > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - reinrich suggestions > - Replace NULL with nullptr in cpu/ppc I've built fastdebug and release on ppc64le successfully. Hotspot tier1 tests succeeded also. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12323#pullrequestreview-1380066776 From duke at openjdk.org Tue Apr 11 21:50:36 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 11 Apr 2023 21:50:36 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block In-Reply-To: References: Message-ID: <3ybhZpVsw5iki0H2OkFswEIqFfEFC0YwNhp9chzu5yU=.8b29a446-3be0-4ae8-a8d0-948003be0411@github.com> On Mon, 3 Apr 2023 03:32:27 GMT, Ioi Lam wrote: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Marked as reviewed by ashu-mehra at github.com (no known OpenJDK username). cds changes look good! just few nitpicks. src/hotspot/share/cds/archiveHeapLoader.cpp line 265: > 263: MemRegion& archive_space) { > 264: size_t total_bytes = 0; > 265: int i = MetaspaceShared::hp; nitpick: this can be replaced with a better variable name instead of `i`, probably region_idx. src/hotspot/share/cds/archiveHeapLoader.cpp line 274: > 272: assert(is_aligned(r->used(), HeapWordSize), "must be"); > 273: total_bytes += r->used(); > 274: loaded_region->_region_index = i; nitpick: we can do away with `_region_index` and use `MetaspaceShared::hp` wherever required. src/hotspot/share/cds/archiveHeapLoader.cpp line 445: > 443: } > 444: > 445: int i = MetaspaceShared::hp; nitpick: same as before, suggest to replace `i` with `region_idx`. src/hotspot/share/cds/archiveHeapWriter.cpp line 54: > 52: // The following are offsets from buffer_bottom() > 53: size_t ArchiveHeapWriter::_buffer_used; > 54: size_t ArchiveHeapWriter::_heap_roots_bottom; nitpick: would be clearer if `_heap_roots_bottom` is named as `_heap_roots_bottom_offset` src/hotspot/share/cds/metaspaceShared.hpp line 63: > 61: ro = 1, // read-only shared space > 62: bm = 2, // relocation bitmaps (freed after file mapping is finished) > 63: hp = 3, // relocation bitmaps (freed after file mapping is finished) This comment needs to be updated. ------------- PR Review: https://git.openjdk.org/jdk/pull/13284#pullrequestreview-1380125495 PR Comment: https://git.openjdk.org/jdk/pull/13284#issuecomment-1504143568 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163361181 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163361230 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163361633 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163362914 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163362267 From cslucas at openjdk.org Tue Apr 11 22:06:42 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 11 Apr 2023 22:06:42 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v7] In-Reply-To: <4uPGi8Ulap_QoQpkL1zTZUdP-jdL_WDEkpdP7asLow4=.9047ce21-688f-4d29-a643-f9acfd4344c7@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <4uPGi8Ulap_QoQpkL1zTZUdP-jdL_WDEkpdP7asLow4=.9047ce21-688f-4d29-a643-f9acfd4344c7@github.com> Message-ID: On Thu, 6 Apr 2023 04:34:52 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - Merge master >> - Add support for rematerializing scalar replaced objects participating in allocation merges > > src/hotspot/share/opto/output.cpp line 755: > >> 753: ciKlass* cik = t->is_oopptr()->exact_klass(); >> 754: assert(cik->is_instance_klass() || >> 755: cik->is_array_klass(), "Not supported allocation."); > > Why spacing changed? The identation level was incorrect before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1163375015 From cslucas at openjdk.org Wed Apr 12 00:32:40 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 12 Apr 2023 00:32:40 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v7] In-Reply-To: <4uPGi8Ulap_QoQpkL1zTZUdP-jdL_WDEkpdP7asLow4=.9047ce21-688f-4d29-a643-f9acfd4344c7@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <4uPGi8Ulap_QoQpkL1zTZUdP-jdL_WDEkpdP7asLow4=.9047ce21-688f-4d29-a643-f9acfd4344c7@github.com> Message-ID: On Thu, 6 Apr 2023 03:25:31 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - Merge master >> - Add support for rematerializing scalar replaced objects participating in allocation merges > > src/hotspot/share/opto/escape.cpp line 633: > >> 631: >> 632: SafePointScalarMergeNode* smerge = new SafePointScalarMergeNode(merge_t, merge_idx); >> 633: smerge->init_req(0, _compile->root()); > > May be use ophi's control here, it should stay bellow merge point. Was there a reason you use `root`? To be honest, for this Node, I thought it didn't matter. I actually just used the same pattern as in PhaseMacroExpand. I'll adjust the patch as you suggested, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1163448361 From amenkov at openjdk.org Wed Apr 12 01:06:46 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 12 Apr 2023 01:06:46 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v6] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - added heap scanning to report unmounted vthreads; > - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; > - common code to handle stack frames are moved into separate class; Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: removed full heap scan. unmounted VT are not considered roots and reported only from references ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/f7831794..f85e95ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=04-05 Stats: 144 lines in 2 files changed: 62 ins; 74 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From amenkov at openjdk.org Wed Apr 12 01:12:49 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 12 Apr 2023 01:12:49 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v7] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - added heap scanning to report unmounted vthreads; > - stacks of mounted vthreads are splitted into 2 parts (vittual thread stack and carrier thread stack), references are reported with correct thread id/class and object tags/frame depth; > - common code to handle stack frames are moved into separate class; Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Fixed indent in collect_vthread_stack_roots ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/f85e95ba..d95a8426 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=05-06 Stats: 43 lines in 1 file changed: 7 ins; 7 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From kbarrett at openjdk.org Wed Apr 12 01:38:49 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 12 Apr 2023 01:38:49 GMT Subject: RFR: 8250269: Replace ATTRIBUTE_ALIGNED with alignas [v15] In-Reply-To: References: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> Message-ID: On Sat, 4 Feb 2023 15:05:06 GMT, Julian Waters wrote: >> C++11 added the alignas attribute, for the purpose of specifying alignment on types, much like compiler specific syntax such as gcc's __attribute__((aligned(x))) or Visual C++'s __declspec(align(x)). >> >> We can phase out the use of the macro in favor of the standard attribute. In the meantime, we can replace the compiler specific definitions of ATTRIBUTE_ALIGNED with a portable definition. We might deprecate the use of the macro but changing its implementation quickly and cleanly applies the feature where the macro is being used. >> >> Note: With certain parts of HotSpot using ATTRIBUTE_ALIGNED so indiscriminately, this commit will likely take some time to get right >> >> This will require adding the alignas attribute to the list of language features approved for use in HotSpot code. (Completed with [8297912](https://github.com/openjdk/jdk/pull/11446)) > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'openjdk:master' into alignas > - alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - ... and 5 more: https://git.openjdk.org/jdk/compare/493aa68c...a621bb62 I've been meaning to review this but have been swamped. Sorry. I don't think this change to HotSpot should be combined with JDK-8305341 / PR#13258. I'm concerned there might be uses of ATTRIBUTE_ALIGNED in other places than at the front of the declaration (like the fixed offset_of macro in the proposed changes). Obviously there aren't any that break compilation. But is alignas in other places valid but with a different meaning? For a discussion of the kind of thing I'm concerned about, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108796 ------------- PR Comment: https://git.openjdk.org/jdk/pull/11431#issuecomment-1504398273 From kbarrett at openjdk.org Wed Apr 12 01:41:34 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 12 Apr 2023 01:41:34 GMT Subject: RFR: 8305341: Alignment should be enforced by alignas instead of compiler specific attributes [v3] In-Reply-To: References: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> Message-ID: On Sat, 8 Apr 2023 13:24:37 GMT, Julian Waters wrote: >> C11 has been stable for a long time on all platforms, so native code can use the standard alignas operator for alignment requirements > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Semicolon I don't think the other bug/PR (JDK-8250269, PR#11431) to change HotSpot's ATTRIBUTE_ALIGNED should be combined with the changes outside of HotSpot. They are doing rather different things, despite the token "alignas" occuring in both. I've been meaning to review PR#11431, but have been swamped. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13258#issuecomment-1504401317 From dholmes at openjdk.org Wed Apr 12 02:11:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Apr 2023 02:11:14 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: <6vD1PFLLelAVWsCl3YpuPBhd_tuc-xlE3wH_HCp7Lu8=.6b9ed684-f94c-434e-82df-15003ded284d@github.com> References: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> <6vD1PFLLelAVWsCl3YpuPBhd_tuc-xlE3wH _HCp7Lu8=.6b9ed684-f94c-434e-82df-15003ded284d@github.com> Message-ID: On Tue, 11 Apr 2023 20:40:14 GMT, Dean Long wrote: >> Yes. The whole LockStack is not meant to be accessed cross-thread, pretty much like any thread's stack is not meant to be accessed like that (including current stack-locking). So what can go wrong? >> With the new locking, we could read junk and compare it to the oop that we're testing against and get a wrong result. We're not going to crash though. >> With the current stack-locking, we would fetch the stack-pointer and check if that address is within the foreign thread's stack. Again, because the other thread is not holding still, we might get a wrong result, but we would not crash. >> So I guess we need to answer the question whether or not jmm_GetThreadInfo() is ok with returning wrong result and what could be the consequences of this. For example, how important is it that the info about the thread(s) is correct and consistent (e.g. what happens if we report two threads both holding the same lock?), etc. But I don't consider this to be part of this PR. >> >> So my proposal is: leave that code as it is, for now (being racy when inspecting foreign threads, but don't crash). Open a new issue to investigate and possibly fix the problem (maybe by safepointing, maybe by handshaking if that is enough, or maybe we find out we don't need to do anything). Add comments in relevant places to point out the problem like you and David suggested earlier. Would that be ok? > > That seems fine to me, as long as we don't crash. But my understanding is that Generational ZGC will crash if it sees a stale oop. Isn't it possible that the racing read sees junk that looks to Generational ZGC like a stale oop? To avoid this, unused slots may need to be set to nullptr even in product builds. But I'm not a GC expert so maybe there's no problem. The old code is "racy but safe - it basically answers the question "what thread held the lock at the time I was asking?" and if we get a stack-addr as the owner at the time we ask, and that stack-address belongs to a given thread t then we report t as the owner. The fact t may have released the lock as soon as we read the stack-addr is immaterial. The new code may be a different matter however. Now the race involves oops, and potentially stale ones IIUC what Stefan is saying. So now the race is not safe, and potentially may crash. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163491093 From dholmes at openjdk.org Wed Apr 12 02:19:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Apr 2023 02:19:32 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v5] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Tue, 11 Apr 2023 18:54:09 GMT, Alex Menkov wrote: > The bug is about objects referenced only from stack of unmounted VT are not reported by FollowReferences. > So we need to detect stackChunk object and report references from them. If an object is only reachable from the stack of a VT and the VT itself is not followed then we don't find that object either. But as I said a VT should be found via the object it is parked on, or via the scheduler. So is it the case that the current logic will not follow the stack of an unmounted Virtual thread? If so that seems wrong - especially if a mounted VT would find those objects. A VT is not a GC root but now I'm unsure exactly what that means. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1504443345 From kbarrett at openjdk.org Wed Apr 12 02:39:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 12 Apr 2023 02:39:36 GMT Subject: RFR: 8305341: Alignment should be enforced by alignas instead of compiler specific attributes [v3] In-Reply-To: References: <2d60fxZxeWZEngMaSE1N4JZz07XkvbXj8jrN_hMbo-0=.51ffb82f-2beb-43f7-9195-062555599d0b@github.com> Message-ID: On Sat, 8 Apr 2023 13:24:37 GMT, Julian Waters wrote: >> C11 has been stable for a long time on all platforms, so native code can use the standard alignas operator for alignment requirements > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Semicolon I was pretty confused by the C changes for a while, since I didn't know or had forgotten that the 32bit Windows ABI only guarantees 4 byte stack alignment, and permits "misaligned" local variables for types such as long long and double. (And I failed to find any definitive documentation for that.) Yuck! What is the purpose of removing `defined(_MSC_VER)` from the conditionals? Is this to allow for other compilers that similarly only ensure 4 byte alignment of the stack? If so, it would have been nice to mention that separate concern in the PR description, rather than making reviewers guess. And do such compilers actually exist? A bit of research suggests gcc (at least) maintains 16 byte alignment (and may warrant the use of -mstackrealign or other hoops). See, for example, https://github.com/uTox/uTox/issues/1304. Is `defined(_WIN32)` really the right conditional? That's true for pretty much any Visual Studio supported target, both 32bit and 64bit. But the alignment spec is effectively a nop for 64bit platforms, so harmless. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13258#issuecomment-1504466526 From amitkumar at openjdk.org Wed Apr 12 04:01:36 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 12 Apr 2023 04:01:36 GMT Subject: RFR: JDK-8301497: Replace NULL with nullptr in cpu/s390 In-Reply-To: References: Message-ID: <6MJ0PtCuMB6Ff9T2ePrNsQVEJ7Ic8D_KxTR9GCAi6BU=.a4062470-c70e-47e3-88d3-e5485f07a351@github.com> On Tue, 31 Jan 2023 11:40:09 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/s390. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! I have tested release & fast debug build. Tier1 test on fastdebug as well. PR seems clean. Thank you so much for the changes. ------------- Marked as reviewed by amitkumar (Author). PR Review: https://git.openjdk.org/jdk/pull/12325#pullrequestreview-1380428664 From amenkov at openjdk.org Wed Apr 12 04:07:36 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 12 Apr 2023 04:07:36 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v5] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 12 Apr 2023 02:17:14 GMT, David Holmes wrote: > If an object is only reachable from the stack of a VT and the VT itself is not followed then we don't find that object either. But as I said a VT should be found via the object it is parked on, or via the scheduler. So is it the case that the current logic will not follow the stack of an unmounted Virtual thread? If so that seems wrong - especially if a mounted VT would find those objects. A VT is not a GC root but now I'm unsure exactly what that means. Right, unmounted VTs are reachable from other objects. And that's correct that current logic does not follow their stack references (current logic considers VT object as normal object, i.e. follows reference to its class and to its fields (iterate_over_object method, line 2687)). Mounted VT does follow stack references as its stack is a part of carrier thread stack and carrier stack locals are followed (collect_stack_roots method, line 2775). The fix just splits carrier thread stack and report some references as references from VT stack (not carrier thread stack). As for "heap roots" FollowReferences spec says: `The heap root are the set of system classes, JNI globals, references from thread stacks, and other objects used as roots for the purposes of garbage collection.` As far as I understand the idea here is all other heap objects are reachable from "heap roots". ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1504564223 From dholmes at openjdk.org Wed Apr 12 04:29:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Apr 2023 04:29:35 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v5] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 12 Apr 2023 04:05:07 GMT, Alex Menkov wrote: > current logic does not follow their stack references Okay that needs to be fixed then. Apologies if that is what you already were doing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1504587109 From iklam at openjdk.org Wed Apr 12 05:00:42 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Apr 2023 05:00:42 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v2] In-Reply-To: References: Message-ID: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - more clean up: heap_regions -> heap_region, etc - @matias9927 comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13284/files - new: https://git.openjdk.org/jdk/pull/13284/files/a852dfbb..a1a3cac7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=00-01 Stats: 116 lines in 12 files changed: 11 ins; 46 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/13284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13284/head:pull/13284 PR: https://git.openjdk.org/jdk/pull/13284 From iklam at openjdk.org Wed Apr 12 05:00:49 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Apr 2023 05:00:49 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v2] In-Reply-To: References: Message-ID: <-wObHrEhbZ2UN9T88NDX3RNnkn3RLuC3BI4KUXSDY80=.1f276277-08a9-44c7-ae2c-7181a3e6b873@github.com> On Fri, 7 Apr 2023 19:17:46 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: >> >> - more clean up: heap_regions -> heap_region, etc >> - @matias9927 comments > > src/hotspot/share/cds/archiveBuilder.cpp line 1086: > >> 1084: p2i(to_requested(start)), size_t(end - start)); >> 1085: log_data(start, end, to_requested(start), /*is_heap=*/true); >> 1086: } > > These log messages can be placed inside the else case before the break Fixed. > src/hotspot/share/cds/archiveHeapWriter.cpp line 369: > >> 367: template void ArchiveHeapWriter::store_requested_oop_in_buffer(T* buffered_addr, >> 368: oop request_oop) { >> 369: //assert(is_in_requested_regions(request_oop), "must be"); > > Some left over commented code. I assume this should be removed or a new assert should be here to replace it. I fixed the assert. > src/hotspot/share/cds/archiveHeapWriter.cpp line 529: > >> 527: num_non_null_ptrs ++; >> 528: >> 529: if (max_idx < idx) { > > Is there a built in min() function we can use here? Maybe std::min()? Updated with the `MAX2()` macro. > src/hotspot/share/cds/filemap.cpp line 1674: > >> 1672: >> 1673: char* buffer = NEW_C_HEAP_ARRAY(char, size_in_bytes, mtClassShared); >> 1674: size_t written = write_bitmap(ptrmap, buffer, 0); > > Maybe add a comment to clarify there is no offset? Constants in method parameters can be confusing sometimes. I changed the code to pass "written" as a parameter similar to the other two calls. Also added comments. > src/hotspot/share/cds/filemap.cpp line 2035: > >> 2033: } >> 2034: if (end < e) { >> 2035: end = e; > > Like mentioned before, maybe we have max() and min() methods to use here. I simplified the code -- there's only one range now so the start/end can be easily determined. > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 520: > >> 518: } else { >> 519: return true; >> 520: } > > Maybe make this `return reserved.contains(range.start()) && reserved.contains(range.last())` Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163601901 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163601972 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163602280 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163602339 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163602400 PR Review Comment: https://git.openjdk.org/jdk/pull/13284#discussion_r1163602433 From stefank at openjdk.org Wed Apr 12 05:29:21 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 12 Apr 2023 05:29:21 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> <6vD1PFLLelAVWsCl3YpuPBhd_tuc-xlE3wH _HCp7Lu8=.6b9ed684-f94c-434e-82df-15003ded284d@github.com> Message-ID: On Wed, 12 Apr 2023 02:08:08 GMT, David Holmes wrote: >> That seems fine to me, as long as we don't crash. But my understanding is that Generational ZGC will crash if it sees a stale oop. Isn't it possible that the racing read sees junk that looks to Generational ZGC like a stale oop? To avoid this, unused slots may need to be set to nullptr even in product builds. But I'm not a GC expert so maybe there's no problem. > > The old code is "racy but safe - it basically answers the question "what thread held the lock at the time I was asking?" and if we get a stack-addr as the owner at the time we ask, and that stack-address belongs to a given thread t then we report t as the owner. The fact t may have released the lock as soon as we read the stack-addr is immaterial. > > The new code may be a different matter however. Now the race involves oops, and potentially stale ones IIUC what Stefan is saying. So now the race is not safe, and potentially may crash. > That seems fine to me, as long as we don't crash. But my understanding is that Generational ZGC will crash if it sees a stale oop. Isn't it possible that the racing read sees junk that looks to Generational ZGC like a stale oop? To avoid this, unused slots may need to be set to nullptr even in product builds. But I'm not a GC expert so maybe there's no problem. Generational ZGC has verification code in fastdebug builds that try to detect stale oops. However, the current LockStack implementation seems to always clear unused slots when running in debug builds. That minimizes the risk that the verification code would find stale oops in the LockStack. Regarding release build, given that the LockStack code doesn't dereference any of the contained oops and we don't have oop verification code in release builds, I don't see of ZGC would crash because of this race. Note however that these kind of races are technically undefined behavior, so I wouldn't be too confident that this code is safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163627980 From iklam at openjdk.org Wed Apr 12 05:29:45 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Apr 2023 05:29:45 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v3] In-Reply-To: References: Message-ID: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into 8298048-combine-cds-heap-to-single-region-PUSH - more clean up: heap_regions -> heap_region, etc - @matias9927 comments - Remove archive region types from G1 - clean up (1) - 8298048: Combine CDS archive heap into a single block ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13284/files - new: https://git.openjdk.org/jdk/pull/13284/files/a1a3cac7..b693d27c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=01-02 Stats: 33051 lines in 911 files changed: 16125 ins; 14331 del; 2595 mod Patch: https://git.openjdk.org/jdk/pull/13284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13284/head:pull/13284 PR: https://git.openjdk.org/jdk/pull/13284 From stefank at openjdk.org Wed Apr 12 06:32:30 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 12 Apr 2023 06:32:30 GMT Subject: RFR: 8305880: Loom: Avoid putting stale object pointers in oops Message-ID: Generational ZGC has extra verification code for oops, which trigger asserts when it finds stale oops. We have cleaned away some usages of stale oops in the upstream repository (openjdk/jdk), but there are still a couple left in the Loom code. I propose that we rewrite the code, to pave the way for Generational ZGC. I've tested this by running these patches on top of openjdk/fibers + ZGC. I've also tested this with Skynet + Generational ZGC, where these issues where first found. ------------- Commit messages: - Review Ron - Introduce derived_base type - Don't create stale oops in DerivedPointersSupport::relativize - Don't create stale oops in ~SafepointOp Changes: https://git.openjdk.org/jdk/pull/13439/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13439&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305880 Stats: 48 lines in 10 files changed: 9 ins; 2 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/13439.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13439/head:pull/13439 PR: https://git.openjdk.org/jdk/pull/13439 From jbechberger at openjdk.org Wed Apr 12 06:52:49 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 12 Apr 2023 06:52:49 GMT Subject: Integrated: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 15:57:40 GMT, Johannes Bechberger wrote: > Fixes the issue by disabling PCDesc cache modifications when in ASGCT. > > Tested on my M1 mac. This pull request has now been integrated. Changeset: d8af7a60 Author: Johannes Bechberger Committer: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/d8af7a6014055295355a1242db6c2872299c6398 Stats: 33 lines in 3 files changed: 32 ins; 0 del; 1 mod 8304725: AsyncGetCallTrace can cause SIGBUS on M1 Reviewed-by: dholmes, stuefe, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/13144 From qamai at openjdk.org Wed Apr 12 06:56:51 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Apr 2023 06:56:51 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 17:47:56 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> special case iotaShuffle > > Marked as reviewed by jbhateja (Reviewer). @jatin-bhateja @iwanowww Thanks a lot for your approvals, I will integrate the patch ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1504758215 From pminborg at openjdk.org Wed Apr 12 07:00:46 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 12 Apr 2023 07:00:46 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v21] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into PR_21_V2 - 8305369: Issues in zero-length memory segment javadoc section - 8305087: MemoryLayout API checks should be more eager - Merge master - Improve code snipet - Update JEP number and name - Update src/java.base/share/classes/java/lang/foreign/MemorySegment.java Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> - Update src/java.base/share/classes/java/lang/foreign/MemorySegment.java Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> - Cleanup finality - Merge pull request #1 from JornVernee/Fix_ULE Fix ULE when intializing LibFallback - ... and 26 more: https://git.openjdk.org/jdk/compare/d8af7a60...5de90878 ------------- Changes: https://git.openjdk.org/jdk/pull/13079/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=20 Stats: 13418 lines in 270 files changed: 5099 ins; 6182 del; 2137 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From jwaters at openjdk.org Wed Apr 12 07:06:39 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 12 Apr 2023 07:06:39 GMT Subject: RFR: 8250269: Replace ATTRIBUTE_ALIGNED with alignas [v15] In-Reply-To: References: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> Message-ID: <8O5-sdYdokB5D2Pqx01aW3C3RgNZWmzaTIeaJH-jcqw=.c30e59ae-3b1a-43f1-900d-9e48711eb600@github.com> On Sat, 4 Feb 2023 15:05:06 GMT, Julian Waters wrote: >> C++11 added the alignas attribute, for the purpose of specifying alignment on types, much like compiler specific syntax such as gcc's __attribute__((aligned(x))) or Visual C++'s __declspec(align(x)). >> >> We can phase out the use of the macro in favor of the standard attribute. In the meantime, we can replace the compiler specific definitions of ATTRIBUTE_ALIGNED with a portable definition. We might deprecate the use of the macro but changing its implementation quickly and cleanly applies the feature where the macro is being used. >> >> Note: With certain parts of HotSpot using ATTRIBUTE_ALIGNED so indiscriminately, this commit will likely take some time to get right >> >> This will require adding the alignas attribute to the list of language features approved for use in HotSpot code. (Completed with [8297912](https://github.com/openjdk/jdk/pull/11446)) > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'openjdk:master' into alignas > - alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - ... and 5 more: https://git.openjdk.org/jdk/compare/1cddb126...a621bb62 Ah, sorry about that, I'll remove the issue from the other PR ------------- PR Comment: https://git.openjdk.org/jdk/pull/11431#issuecomment-1504769818 From pminborg at openjdk.org Wed Apr 12 07:21:46 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 12 Apr 2023 07:21:46 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v22] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #2 from JornVernee/WSL_BB account for missing functional in WSL in TestByteBuffer - account for missing mincore on WSL in TestByteBuffer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/5de90878..6164abe8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=20-21 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From pminborg at openjdk.org Wed Apr 12 07:21:49 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 12 Apr 2023 07:21:49 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v20] In-Reply-To: References: Message-ID: <6e7bW67JaR5Lv4INrepQU7yN49yx9L4QUXqQ56EiK_g=.25fc906c-871a-4201-b06c-e173120f5204@github.com> On Tue, 11 Apr 2023 17:48:37 GMT, Jorn Vernee wrote: >> Per Minborg has updated the pull request incrementally with two additional commits since the last revision: >> >> - 8305369: Issues in zero-length memory segment javadoc section >> - 8305087: MemoryLayout API checks should be more eager > > test/jdk/java/foreign/TestByteBuffer.java line 317: > >> 315: >> 316: @Test >> 317: public void testMappedSegmentAsByteBuffer() throws Throwable { > > While testing something else, I noticed that this test is failing on Linux/WSL since the WSL 1 kernel does not implement `mincore`. We can simply skip the test in that case, as we already do for other tests. I've submitted a patch here: https://github.com/minborg/jdk/pull/2 Thanks for providing this patch @JornVernee ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1163721611 From alanb at openjdk.org Wed Apr 12 07:47:40 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 12 Apr 2023 07:47:40 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 29 Mar 2023 11:28:53 GMT, Aleksey Shipilev wrote: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` src/hotspot/share/include/jvm.h line 279: > 277: > 278: JNIEXPORT void JNICALL > 279: JVM_Sleep(JNIEnv *env, jclass threadClass, jlong millis, jint nanos); I wonder if it would be simpler to just provide a single value, in nanoseconds, to the VM. That's enough for a sleep of 292 years. Windows would still need to convert to milliseconds of course but it overall would avoid sending two values down to the park code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1163750704 From duke at openjdk.org Wed Apr 12 07:49:19 2023 From: duke at openjdk.org (Milind Mantri) Date: Wed, 12 Apr 2023 07:49:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 11:59:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > RISCV update src/hotspot/share/utilities/globalDefinitions.hpp line 1050: > 1048: // Legacy stack-locking, with monitors as 2nd tier > 1049: LEGACY = 1, > 1050: // New lightweight locing, with monitors as 2nd tier Suggestion: // New lightweight locking, with monitors as 2nd tier I was just passing by your PR. Pointing out a minor typo. Cheers! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163659759 From eosterlund at openjdk.org Wed Apr 12 09:22:34 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 12 Apr 2023 09:22:34 GMT Subject: RFR: 8305880: Loom: Avoid putting stale object pointers in oops In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 06:25:17 GMT, Stefan Karlsson wrote: > Generational ZGC has extra verification code for oops, which trigger asserts when it finds stale oops. We have cleaned away some usages of stale oops in the upstream repository (openjdk/jdk), but there are still a couple left in the Loom code. I propose that we rewrite the code, to pave the way for Generational ZGC. > > I've tested this by running these patches on top of openjdk/fibers + ZGC. I've also tested this with Skynet + Generational ZGC, where these issues where first found. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13439#pullrequestreview-1380889505 From sspitsyn at openjdk.org Wed Apr 12 10:13:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 12 Apr 2023 10:13:49 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames src/hotspot/share/prims/jvmtiAgent.cpp line 265: > 263: // For statically linked agents we cant't rely on os_lib == nullptr because > 264: // statically linked agents could have a handle of RTLD_DEFAULT which == 0 on some platforms. > 265: // If this function returns true, then agent->is_static_lib().&& agent->is_loaded(). Nit: replace : ".&&" => "&&" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1163921271 From sspitsyn at openjdk.org Wed Apr 12 10:35:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 12 Apr 2023 10:35:44 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames src/hotspot/share/prims/jvmtiAgent.cpp line 323: > 321: assert(agent != nullptr, "invariant"); > 322: if (!agent->is_loaded()) { > 323: if (!load_agent_from_executable(agent, on_load_symbols, num_symbol_entries)) { It feels like I'm missing something. We already checked and found at line 322 that `agent->is_loaded() == false`. Also, we have the comment at line 265: 265 // If this function returns true, then agent->is_static_lib().&& agent->is_loaded(). 266 static bool load_agent_from_executable(Agent* agent, const char* on_load_symbols[], size_t num_symbol_entries) { As the `agent->is_loaded() == false` then t he condition `agent->is_static_lib() && agent->is_loaded()` has to be `false` and can not be `true`. Then the if-check at line 323 is not needed and can be removed. Is it right? Otherwise, the comment at line 265 can be incorrect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1163943712 From rkennke at openjdk.org Wed Apr 12 10:45:19 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 12 Apr 2023 10:45:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v57] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Bunch of comments and typos - Don't use NativeAccess in LockStack::contains() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/d1c88261..cb260c1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=56 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=55-56 Stats: 8 lines in 5 files changed: 4 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From mgronlun at openjdk.org Wed Apr 12 10:47:09 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 12 Apr 2023 10:47:09 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 10:31:37 GMT, Serguei Spitsyn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> renames > > src/hotspot/share/prims/jvmtiAgent.cpp line 323: > >> 321: assert(agent != nullptr, "invariant"); >> 322: if (!agent->is_loaded()) { >> 323: if (!load_agent_from_executable(agent, on_load_symbols, num_symbol_entries)) { > > It feels like I'm missing something. > We already checked and found at line 322 that `agent->is_loaded() == false`. > Also, we have the comment at line 265: > > 265 // If this function returns true, then agent->is_static_lib().&& agent->is_loaded(). > 266 static bool load_agent_from_executable(Agent* agent, const char* on_load_symbols[], size_t num_symbol_entries) { > > As the `agent->is_loaded() == false` then t he condition `agent->is_static_lib() && agent->is_loaded()` has to be `false` and can not be `true`. Then one of the if-checks at lines 322 and 323 is not needed and can be removed. Is it right? Otherwise, the comment at line 265 can be incorrect. Good observation, Serguei. It is because some paths call into lookup_On_load_Entry_point() twice. It is primarily the attempted conversion of xrun agents, the first invocation comes from JvmtiAgent::convert_xrun_agent(). This will have the agent "loaded". If there is an Agent_OnLoad function, the agent is converted (i.e. xrun removed). Then when the agent is to invoke the Agent_OnLoad function, there is a second invocation. Here a converted xrun library is already loaded, so I bypass attempting to load it again by checking the is_loaded() property. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1163954754 From mbaesken at openjdk.org Wed Apr 12 11:03:42 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 12 Apr 2023 11:03:42 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v4] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 08:55:07 GMT, Matthias Baesken wrote: >> The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. >> However at some places INCLUDE_CDS is missing and should be added. >> >> One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust arguments handling So considering the low interest in adding the if INCLUDE_CDS checks in the codebase, there is still the fact that ` @requires vm.cds` is missing in quite a few tests (see the list above). So what you think about closing this PR and instead fixing the missing required comments (probably in a separate issue) ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12691#issuecomment-1505075884 From sspitsyn at openjdk.org Wed Apr 12 11:06:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 12 Apr 2023 11:06:49 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames I've posted a couple of questions now. There can be more later. Sorry for the latency again. src/hotspot/share/prims/jvmtiAgent.cpp line 357: > 355: vm_exit_during_initialization("Could not find JVM_OnLoad or Agent_OnLoad function in the library", name()); > 356: } > 357: _xrun = false; // converted Just questions to understand it better. Neither `JVM_Onload` nor `Agent_Onload` entry points are stored after these lookups. It means that in order to be called later (as the comment at line 350 says) they have to be looked up again. Is it right? Was it the same originally? ------------- PR Review: https://git.openjdk.org/jdk/pull/12923#pullrequestreview-1381065121 PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1163972452 From mgronlun at openjdk.org Wed Apr 12 11:11:41 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 12 Apr 2023 11:11:41 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames > Thank you for having a look. I think I have answered them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1505086800 From mgronlun at openjdk.org Wed Apr 12 11:11:45 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 12 Apr 2023 11:11:45 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 11:01:43 GMT, Serguei Spitsyn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> renames > > src/hotspot/share/prims/jvmtiAgent.cpp line 357: > >> 355: vm_exit_during_initialization("Could not find JVM_OnLoad or Agent_OnLoad function in the library", name()); >> 356: } >> 357: _xrun = false; // converted > > Just questions to understand it better. > Neither `JVM_Onload` nor `Agent_Onload` entry points are stored after these lookups. It means that in order to be called later (as the comment at line 350 says) they have to be looked up again. > Is it right? Was it the same originally? The entry points are not saved and so have to be looked up again. It was the same originally. That is why there is a check and branch on agent->is_loaded(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1163979282 From lgxbslgx at gmail.com Wed Apr 12 13:02:45 2023 From: lgxbslgx at gmail.com (Guoxiong Li) Date: Wed, 12 Apr 2023 21:02:45 +0800 Subject: [Investigation] Considering using a hashtable to store the signature handlers In-Reply-To: <7d49663e-6a97-c1ff-e41e-cab3c04c3f26@littlepinkcloud.com> References: <7d49663e-6a97-c1ff-e41e-cab3c04c3f26@littlepinkcloud.com> Message-ID: On Tue, Apr 11, 2023 at 5:00?PM Andrew Haley wrote: > I would measure the time taken for the operations of insertion and lookup > over a realistic range. > Thanks for the test. The draft patch [1] may be useful to you. [1] https://github.com/lgxbslgx/jdk/tree/SIGNATURE_HANDLERS Best Regards, -- Guoxiong -------------- next part -------------- An HTML attachment was scrubbed... URL: From alanb at openjdk.org Wed Apr 12 14:58:36 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 12 Apr 2023 14:58:36 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v7] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 12 Apr 2023 01:12:49 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected and reported as JVMTI_HEAP_REFERENCE_THREAD; >> - stack references for unmounted VT are reported as JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Fixed indent in collect_vthread_stack_roots In the spec for FollowReferences, it says that the heap roots include "references from thread stacks". There is a similar sentence in the deprecated IterateOverReachableObjects function. We should decide whether these sentences need to be changed to say "platform thread stacks". ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1505423293 From eliu at openjdk.org Wed Apr 12 16:09:41 2023 From: eliu at openjdk.org (Eric Liu) Date: Wed, 12 Apr 2023 16:09:41 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: <4vt3o6jU1_qUlYB4YtkXOUmG8Gi9NzRUHXjYqboYlPU=.3edc8876-bf2a-40a0-bb3e-3c5ea2aea3d5@github.com> On Tue, 4 Apr 2023 13:46:12 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style src/hotspot/share/opto/vectorIntrinsics.cpp line 1953: > 1951: Node* v1 = unbox_vector(argument(3), vbox_type, elem_bt, num_elem); > 1952: Node* v2 = unbox_vector(argument(4), vbox_type, elem_bt, num_elem); > 1953: if (v1 == NULL || v2 == NULL) { nullptr is more common. src/hotspot/share/opto/vectornode.cpp line 1999: > 1997: // (VectorSlice X Y 0) => X > 1998: // (VectorSlice X Y VLENGTH) => Y > 1999: if (origin->is_con(0)) { is_con(0) is pre defined as TypeInt::ZERO. src/hotspot/share/opto/vectornode.cpp line 2001: > 1999: if (origin->is_con(0)) { > 2000: return in(1); > 2001: } else if (origin->is_con(Matcher::vector_length(this))) { If they were the same, length() looks simple. Suggestion: } else if (origin->is_con(length())) { src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 635: > 633: } > 634: > 635: @ForceInline May I ask why `forceInline` here is necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1164311234 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1164328793 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1164346771 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1164348507 From pchilanomate at openjdk.org Wed Apr 12 16:14:29 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 12 Apr 2023 16:14:29 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) Message-ID: Please review this fix. The check to skip walking stacks of virtual threads will not identify a thread in a transition since it relies on the jvmti_vthread() which would have already changed at the very beginning of it. The crash happens because the anchor might have changed between walking the stack of the thread in a transition and executing the deopt handshake for a particular frame. The frame is never found and looping executing fr.sender() crashes. This scenario can happen if the initial EscapeBarrierSuspendHandshake executed to synchronize against all threads finds the thread blocked in the stackchunk allocation path. Because the thread will actually block on the next transition to Java, and not on a blocked->vm transition, it will continue executing and change its anchor while the requester is walking its stack. There are more details in the bug comments. The fix modifies the conditional to check if the continuation is mounted or not. This will identify the transition case too and won't face the anchor change issue since the continuation entry will be removed after returning from the freeze call. The fix was tested against a reproducer which I attached to the bug. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/13446/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13446&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305625 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13446.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13446/head:pull/13446 PR: https://git.openjdk.org/jdk/pull/13446 From duke at openjdk.org Wed Apr 12 17:29:52 2023 From: duke at openjdk.org (duke) Date: Wed, 12 Apr 2023 17:29:52 GMT Subject: Withdrawn: 8298091: Dump native instruction along with nmethod name when using Compiler.codelist In-Reply-To: <1dSP2mbbyqiKLRmWwCfwGdb67ll7-K4cRtV_muUkS9I=.2d2dc2f8-acca-49de-89f5-70a825c057fa@github.com> References: <1dSP2mbbyqiKLRmWwCfwGdb67ll7-K4cRtV_muUkS9I=.2d2dc2f8-acca-49de-89f5-70a825c057fa@github.com> Message-ID: <9eHublccmZCiRVn1sPJZCfD6k8d6iQGA2txUt-Ne7cw=.fffa3d9c-9e65-487d-9f37-b093b1953392@github.com> On Thu, 2 Feb 2023 02:12:06 GMT, Yi Yang wrote: > This patch adds new functionality for Compiler.codelist, it optionally prints assembly code along with compiled method line. This allows us to inspect assembly code for specified JIT method on the fly, also a manageable flag ForceLoadDisassembler is added to load hs-dis if it was not initially present when JVM starts. > > The output looks like this: > > $ jcmd Compiler.codelist decode=Thread.interrupt > 76900: > ... > 2678 3 0 com.sun.tools.javac.api.JavacTaskPool$ReusableContext$1.scan(Lcom/sun/source/tree/Tree;Lcom/sun/tools/javac/code/Symtab;)Ljava/lang/Void; [0x00007fbe85105590, 0x00007fbe85105780 - 0x00007fbe85105ec0] > 2683 3 0 java.lang.Thread.interrupted()Z [0x00007fbe85106090, 0x00007fbe85106220 - 0x00007fbe85106488] > [Disassembly] > -------------------------------------------------------------------------------- > [Constant Pool (empty)] > > -------------------------------------------------------------------------------- > > [MachCode] > 0x00007fbe85106220: 8984 2400 | c0fe ff55 | 4883 ec40 | 4181 7f20 | 0700 0000 | 7405 e8a5 | 8af9 0648 | be28 d2ca > 0x00007fbe85106240: 3cbe 7f00 | 008b bef4 | 0000 0083 | c702 89be | f400 0000 | 81e7 fe07 | 0000 83ff | 000f 8461 > 0x00007fbe85106260: 0100 0048 | be28 d2ca | 3cbe 7f00 | 0048 8386 | 3801 0000 | 0149 8bb7 | a802 0000 | 488b 3648 > 0x00007fbe85106280: 3b06 488b | fe48 bb28 | d2ca 3cbe | 7f00 008b | 7f08 49ba | 0000 0000 | 0800 0000 | 4903 fa48 > 0x00007fbe851062a0: 3bbb 5801 | 0000 750d | 4883 8360 | 0100 0001 | e960 0000 | 0048 3bbb | 6801 0000 | 750d 4883 > 0x00007fbe851062c0: 8370 0100 | 0001 e94a | 0000 0048 | 83bb 5801 | 0000 0075 | 1748 89bb | 5801 0000 | 48c7 8360 > 0x00007fbe851062e0: 0100 0001 | 0000 00e9 | 2900 0000 | 4883 bb68 | 0100 0000 | 7517 4889 | bb68 0100 | 0048 c783 > 0x00007fbe85106300: 7001 0000 | 0100 0000 | e908 0000 | 0048 8383 | 4801 0000 | 0148 bfc8 | 7036 3cbe | 7f00 008b > 0x00007fbe85106320: 9ff4 0000 | 0083 c302 | 899f f400 | 0000 81e3 | feff 1f00 | 83fb 000f | 84ad 0000 | 000f be7e > 0x00007fbe85106340: 3683 ff00 | 48bb c870 | 363c be7f | 0000 48b8 | 3801 0000 | 0000 0000 | 0f84 0a00 | 0000 48b8 > 0x00007fbe85106360: 4801 0000 | 0000 0000 | 488b 1403 | 488d 5201 | 4889 1403 | 0f84 2e00 | 0000 897c | 2428 bb00 > 0x00007fbe85106380: 0000 0088 | 5e36 f083 | 4424 c000 | 48be c870 | 363c be7f | 0000 4883 | 8658 0100 | 0001 90e8 > 0x00007fbe851063a0: 5c11 fb06 | 8b7c 2428 | 83e7 0183 | e701 488b | c748 83c4 | 405d 493b | a778 0300 | 000f 8748 > 0x00007fbe851063c0: 0000 00c3 | 49ba 98aa | 4200 0800 | 0000 4c89 | 5424 0848 | c704 24ff | ffff ffe8 | 20f7 0607 > 0x00007fbe851063e0: e97e feff | ffe8 1684 | 0607 49ba | 7880 0100 | 0800 0000 | 4c89 5424 | 0848 c704 | 24ff ffff > 0x00007fbe85106400: ffe8 faf6 | 0607 e932 | ffff ff49 | bab6 6310 | 85be 7f00 | 004d 8997 | 9003 0000 | e95f 77fb > 0x00007fbe85106420: 0649 8b87 | 2804 0000 | 49c7 8728 | 0400 0000 | 0000 0049 | c787 3004 | 0000 0000 | 0000 4883 > 0x00007fbe85106440: c440 5de9 | b86b 0607 | e833 af06 | 0748 bf42 | 29a7 a2be | 7f00 0048 | 83e4 f0e8 | b097 4b1d > 0x00007fbe85106460: f449 ba61 | 6410 85be | 7f00 0041 | 52e9 ae69 | fb06 48bb | 0000 0000 | 0000 0000 | e9fb ffff > 0x00007fbe85106480: fff4 f4f4 | f4f4 f4f4 > [/MachCode] > -------------------------------------------------------------------------------- > [/Disassembly] > 2684 3 0 java.io.FileInputStream.read()I [0x00007fbe85106590, 0x00007fbe85106740 - 0x00007fbe85106928] > 2686 3 0 jdk.internal.org.jline.utils.NonBlockingInputStream.read(J)I [0x00007fbe85106a90, 0x00007fbe85106c20 - 0x00007fbe85106de0] > 2687 3 0 jdk.internal.org.jline.terminal.impl.AbstractPty.checkInterrupted()V [0x00007fbe85106e90, 0x00007fbe85107060 - 0x00007fbe85107458] > > This is a common situation in production environment. Few applications will bring hsdis at startup, but when we really need it, we seem to have no good way except to restart application. Now, we can turn on ForceLoadDisassembler and load hsdis dynamically without restarting: > > $ jcmd Compiler.codelist decode=Thread.interrupt > 2679 3 0 com.sun.source.util.TreeScanner.scan(Lcom/sun/source/tree/Tree;Ljava/lang/Object;)Ljava/lang/Object; [0x00007fbe85105110, 0x00007fbe851052a0 - 0x00007fbe851054c8] > 2678 3 0 com.sun.tools.javac.api.JavacTaskPool$ReusableContext$1.scan(Lcom/sun/source/tree/Tree;Lcom/sun/tools/javac/code/Symtab;)Ljava/lang/Void; [0x00007fbe85105590, 0x00007fbe85105780 - 0x00007fbe85105ec0] > 2683 3 0 java.lang.Thread.interrupted()Z [0x00007fbe85106090, 0x00007fbe85106220 - 0x00007fbe85106488] > [Disassembly] > -------------------------------------------------------------------------------- > [Constant Pool (empty)] > > -------------------------------------------------------------------------------- > > [Verified Entry Point] > # {method} {0x000000080042aa98} 'interrupted' '()Z' in 'java/lang/Thread' > # [sp+0x50] (sp of caller) > 0x00007fbe85106220: mov %eax,-0x14000(%rsp) > 0x00007fbe85106227: push %rbp > 0x00007fbe85106228: sub $0x40,%rsp > 0x00007fbe8510622c: cmpl $0x7,0x20(%r15) > 0x00007fbe85106234: je 0x00007fbe8510623b > .... > 0x00007fbe8510643e: add $0x40,%rsp > 0x00007fbe85106442: pop %rbp > 0x00007fbe85106443: jmpq 0x00007fbe8c16d000 ; {runtime_call unwind_exception Runtime1 stub} > [Exception Handler] > 0x00007fbe85106448: callq 0x00007fbe8c171380 ; {no_reloc} > 0x00007fbe8510644d: mov $0x7fbea2a72942,%rdi ; {external_word} > 0x00007fbe85106457: and $0xfffffffffffffff0,%rsp > 0x00007fbe8510645b: callq 0x00007fbea25bfc10 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007fbe85106460: hlt > [Deopt Handler Code] > 0x00007fbe85106461: mov $0x7fbe85106461,%r10 ; {section_word} > 0x00007fbe8510646b: push %r10 > 0x00007fbe8510646d: jmpq 0x00007fbe8c0bce20 ; {runtime_call DeoptimizationBlob} > 0x00007fbe85106472: mov $0x0,%rbx ; {static_stub} > 0x00007fbe8510647c: jmpq 0x00007fbe8510647c ; {runtime_call} > 0x00007fbe85106481: hlt > 0x00007fbe85106482: hlt > 0x00007fbe85106483: hlt > 0x00007fbe85106484: hlt > 0x00007fbe85106485: hlt > 0x00007fbe85106486: hlt > 0x00007fbe85106487: hlt > -------------------------------------------------------------------------------- > [/Disassembly] > 2684 3 0 java.io.FileInputStream.read()I [0x00007fbe85106590, 0x00007fbe85106740 - 0x00007fbe85106928] > 2686 3 0 jdk.internal.org.jline.utils.NonBlockingInputStream.read(J)I [0x00007fbe85106a90, 0x00007fbe85106c20 - 0x00007fbe85106de0] > 2687 3 0 jdk.internal.org.jline.terminal.impl.AbstractPty.checkInterrupted()V [0x00007fbe85106e90, 0x00007fbe85107060 - 0x00007fbe85107458] > ... > > > A sample use case is we want to know where line of code we have high cache line contention once we know a JIT address from perf c2c tool: > > Cacheline 0x456017840 > -- Peer Snoop -- ------- Store Refs ------ ------- CL -------- ---------- cycles ---------- Total cpu > Rmt Lcl L1 Hit L1 Miss N/A Off Node PA cnt Code address rmt peer lcl peer load records cnt Symbol > 0.00% 35.59% 0.00% 0.00% 0.00% 0x0 1 1 0xffff688f2a84 0 406 324 199524 1 [.] 0x0000ffff688f2a84 [JIT] ti > 0.00% 33.12% 0.00% 0.00% 0.00% 0x0 1 1 0xffff688f2ab8 0 411 329 190202 1 [.] > ... > > But this example is too conservative. In fact, after adding this function, we can easily check the assembly representation of any JIT method, whether we find a potential performance problem with a JIT address, or we find it from the flame graph, or when we do some debugging. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12381 From iklam at openjdk.org Wed Apr 12 17:44:42 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Apr 2023 17:44:42 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v4] In-Reply-To: References: Message-ID: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @ashu-mehra comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13284/files - new: https://git.openjdk.org/jdk/pull/13284/files/b693d27c..8ce6953e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=02-03 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13284/head:pull/13284 PR: https://git.openjdk.org/jdk/pull/13284 From jvernee at openjdk.org Wed Apr 12 17:54:44 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 12 Apr 2023 17:54:44 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v20] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 17:59:32 GMT, Vladimir Ivanov wrote: >> Per Minborg has updated the pull request incrementally with two additional commits since the last revision: >> >> - 8305369: Issues in zero-length memory segment javadoc section >> - 8305087: MemoryLayout API checks should be more eager > > src/hotspot/share/prims/jvm.cpp line 3473: > >> 3471: >> 3472: JVM_LEAF(jboolean, JVM_IsForeignLinkerSupported(void)) >> 3473: return ForeignGlobals::has_port() ? JNI_TRUE : JNI_FALSE; > > On naming: I find `has_port()` confusing. Why don't you simply call VM counterpart `is_foreign_linker_supported`? Alternative ideas: `is_supported()`, `has_native_support()`. Thanks for the review. I've created a patch for changing the name to `is_foreign_linker_supported` here: https://github.com/minborg/jdk/pull/3 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1164463475 From iklam at openjdk.org Wed Apr 12 17:59:23 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Apr 2023 17:59:23 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v5] In-Reply-To: References: Message-ID: <3wLQHgnmbwNVuVVZDe1Nlt6V6upyOrUXDxKKkMEOT8Y=.9a9c2b21-e683-4bd0-94ee-8ebcaf1b8333@github.com> > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @ashu-mehra comments; some clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13284/files - new: https://git.openjdk.org/jdk/pull/13284/files/8ce6953e..30542e53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=03-04 Stats: 11 lines in 3 files changed: 1 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13284/head:pull/13284 PR: https://git.openjdk.org/jdk/pull/13284 From iklam at openjdk.org Wed Apr 12 17:59:25 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Apr 2023 17:59:25 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block In-Reply-To: <3ybhZpVsw5iki0H2OkFswEIqFfEFC0YwNhp9chzu5yU=.8b29a446-3be0-4ae8-a8d0-948003be0411@github.com> References: <3ybhZpVsw5iki0H2OkFswEIqFfEFC0YwNhp9chzu5yU=.8b29a446-3be0-4ae8-a8d0-948003be0411@github.com> Message-ID: On Tue, 11 Apr 2023 21:47:40 GMT, Ashutosh Mehra wrote: > cds changes look good! just few nitpicks. Thanks for the review. I've incorporated your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13284#issuecomment-1505698259 From rkennke at openjdk.org Wed Apr 12 19:21:11 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 12 Apr 2023 19:21:11 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v58] In-Reply-To: References: Message-ID: <12a9gxyIsM9NIHJyjPCqEcdtZGJPO9lgUxQdm6eYy70=.50d37ab5-e76c-4dc9-9d45-0cc60ddc7429@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Replace UseHeavyMonitor with LockingMode == LM_MONITOR - Prefix LockingMode constants with LM_* ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/cb260c1f..f5451943 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=57 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=56-57 Stats: 778 lines in 43 files changed: 257 ins; 265 del; 256 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From amenkov at openjdk.org Wed Apr 12 20:19:36 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 12 Apr 2023 20:19:36 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v7] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 12 Apr 2023 14:55:59 GMT, Alan Bateman wrote: > In the spec for FollowReferences, it says that the heap roots include "references from thread stacks". There is a similar sentence in the deprecated IterateOverReachableObjects function. We should decide whether these sentences need to be changed to say "platform thread stacks". IterateOverReachableObjects spec is stricter. It contains: `Roots are always reported to the profiler before any object references are reported.` That is the reason this fix fixes only FollowReferences (stack references for unmounted VTs are collected only when is_advanced_heap_walk() is true), because otherwise we'd have to find all stack references before visit the object, i.e. it requires full heap scan in the beginning. I don't care much about deprecated IterateOverReachableObjects (it shouldn't' be used by modern agents), but I think change in the spec would allow to fix IterateOverReachableObjects as well ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1505869437 From rrich at openjdk.org Wed Apr 12 20:55:09 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 12 Apr 2023 20:55:09 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 15:48:53 GMT, Patricio Chilano Mateo wrote: > Please review this fix. The check to skip walking stacks of virtual threads will not identify a thread in a transition since it relies on the jvmti_vthread() which would have already changed at the very beginning of it. The crash happens because the anchor might have changed between walking the stack of the thread in a transition and executing the deopt handshake for a particular frame. The frame is never found and looping executing fr.sender() crashes. This scenario can happen if the initial EscapeBarrierSuspendHandshake executed to synchronize against all threads finds the thread blocked in the stackchunk allocation path. Because the thread will actually block on the next transition to Java, and not on a blocked->vm transition, it will continue executing and change its anchor while the requester is walking its stack. There are more details in the bug comments. > The fix modifies the conditional to check if the continuation is mounted or not. This will identify the transition case too and won't face the anchor change issue since the continuation entry will be removed after returning from the freeze call. > The fix was tested against a reproducer which I attached to the bug. > > Thanks, > Patricio The fix looks good to me. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13446#pullrequestreview-1382101292 From fyang at openjdk.org Thu Apr 13 00:15:57 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Apr 2023 00:15:57 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v3] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 10:47:43 GMT, Fei Yang wrote: >> Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8301496 >> - Fixes >> - Merge remote-tracking branch 'origin/master' into JDK-8301496 >> - Fixes >> - Merge remote-tracking branch 'origin/master' into JDK-8301496 >> - Replace NULL with nullptr in cpu/riscv > > Update change looks good. Thanks. > Thanks @RealFYang! Would you mind running the tier1 tests on RISC-V? I don't have access to that architecture. Sure! I have performed tier1 test on linux-riscv64 hifive unmatched boards, result looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12324#issuecomment-1506127350 From dholmes at openjdk.org Thu Apr 13 05:49:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Apr 2023 05:49:37 GMT Subject: RFR: 8305936: JavaThread::create_system_thread_object has unused is_visible argument Message-ID: Please review this simple cleanup of an unused parameter in `create_system_thread_object`. Details are in JBS. Testing: tiers 1-3 Thanks. ------------- Commit messages: - 8305936: JavaThread::create_system_thread_object has unused is_visible argument Changes: https://git.openjdk.org/jdk/pull/13455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13455&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305936 Stats: 17 lines in 9 files changed: 0 ins; 2 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/13455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13455/head:pull/13455 PR: https://git.openjdk.org/jdk/pull/13455 From alanb at openjdk.org Thu Apr 13 06:19:32 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 13 Apr 2023 06:19:32 GMT Subject: RFR: 8305936: JavaThread::create_system_thread_object has unused is_visible argument In-Reply-To: References: Message-ID: <76L-hhJOUzq5PZrz4ky2gi_HaDQ-nI3f9Bs0WZHdLzA=.14d85a40-45b8-4e45-91e1-435787acee81@github.com> On Thu, 13 Apr 2023 05:41:31 GMT, David Holmes wrote: > Please review this simple cleanup of an unused parameter in `create_system_thread_object`. Details are in JBS. > > Testing: tiers 1-3 > > Thanks. Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13455#pullrequestreview-1382703726 From dholmes at openjdk.org Thu Apr 13 06:32:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Apr 2023 06:32:39 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v7] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 12 Apr 2023 14:55:59 GMT, Alan Bateman wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed indent in collect_vthread_stack_roots > > In the spec for FollowReferences, it says that the heap roots include "references from thread stacks". There is a similar sentence in the deprecated IterateOverReachableObjects function. We should decide whether these sentences need to be changed to say "platform thread stacks". @AlanBateman as virtual threads have already been defined to not be GC roots aka "heap roots" then it would seem the spec does need adjusting to say "platform threads". ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1506420491 From pminborg at openjdk.org Thu Apr 13 06:36:51 2023 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 13 Apr 2023 06:36:51 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v23] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > ### Specdiff > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > ### Javadoc > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > ### Tests > > - [X] Tier1 > - [X] Tier2 > - [ ] Tier3 > - [ ] Tier4 > - [ ] Tier5 > - [ ] Tier6 Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #3 from JornVernee/IsForeignLinkerSupported rename has_port - rename has_port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/6164abe8..91f43d13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=21-22 Stats: 10 lines in 10 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From dholmes at openjdk.org Thu Apr 13 07:00:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Apr 2023 07:00:32 GMT Subject: RFR: 8305936: JavaThread::create_system_thread_object has unused is_visible argument In-Reply-To: <76L-hhJOUzq5PZrz4ky2gi_HaDQ-nI3f9Bs0WZHdLzA=.14d85a40-45b8-4e45-91e1-435787acee81@github.com> References: <76L-hhJOUzq5PZrz4ky2gi_HaDQ-nI3f9Bs0WZHdLzA=.14d85a40-45b8-4e45-91e1-435787acee81@github.com> Message-ID: On Thu, 13 Apr 2023 06:17:00 GMT, Alan Bateman wrote: >> Please review this simple cleanup of an unused parameter in `create_system_thread_object`. Details are in JBS. >> >> Testing: tiers 1-3 >> >> Thanks. > > Marked as reviewed by alanb (Reviewer). Thanks for the review @AlanBateman ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13455#issuecomment-1506449282 From qamai at openjdk.org Thu Apr 13 07:05:54 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Apr 2023 07:05:54 GMT Subject: Integrated: 8304450: [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: <1_1SPocmj-NTrY9ZZ35vVCt7Gc4dVtZVIxeyJwXrBj0=.66358c02-6efa-4d04-9e8b-3ceb12c6af66@github.com> On Sun, 19 Mar 2023 13:04:19 GMT, Quan Anh Mai wrote: > Hi, > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {external_word} > vpackusdw %xmm0,%xmm0,%xmm0 > vpackuswb %xmm0,%xmm0,%xmm0 > vpmovsxbd %xmm0,%xmm3 > vpcmpgtd %xmm3,%xmm1,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fc2acb4e0d8 > vpmovzxbd %xmm0,%xmm0 > vpermd %ymm2,%ymm0,%ymm0 > movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} > vmovdqu %xmm0,0x10(%r10) > > After: > movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} > vmovdqu 0x10(%r10),%xmm2 > vpxor %xmm0,%xmm0,%xmm0 > vpcmpgtd %xmm2,%xmm0,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fa818b27cb1 > vpermd %ymm1,%ymm2,%ymm0 > movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} > vmovdqu %xmm0,0x10(%r10) > > Please take a look and leave reviews. Thanks a lot. This pull request has now been integrated. Changeset: e846a1d7 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/e846a1d70043f7b57ae76847e85e5426c86539a5 Stats: 3690 lines in 64 files changed: 1615 ins; 1169 del; 906 mod 8304450: [vectorapi] Refactor VectorShuffle implementation Reviewed-by: psandoz, xgong, jbhateja, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/13093 From cjplummer at openjdk.org Thu Apr 13 07:15:39 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 13 Apr 2023 07:15:39 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v7] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 12 Apr 2023 14:55:59 GMT, Alan Bateman wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed indent in collect_vthread_stack_roots > > In the spec for FollowReferences, it says that the heap roots include "references from thread stacks". There is a similar sentence in the deprecated IterateOverReachableObjects function. We should decide whether these sentences need to be changed to say "platform thread stacks". > @AlanBateman as virtual threads have already been defined to not be GC roots aka "heap roots" then it would seem the spec does need adjusting to say "platform threads". I know the implementation of virtual threads in hotspot does not treat virtual threads as GC roots w.r.t. garbage collection. However, that's a JVM implementation detail, and I'm not so sure I would extend that to imply that "virtual threads have already been defined to not be GC roots" from a spec perspective. Do we actually say that anywhere in a spec? We have a similar situation with hprof, and the current plan is to include a `HPROF_GC_ROOT_THREAD_OBJ` record for each virtual thread (and accompanying set of `HPROF_GC_ROOT_JAVA_FRAME` records for the stack). In other words, virtual threads will be GC roots for any tool doing analysis of the hprof heap dump. Note that decisions in this area are very much in flux, and we are also considering making it optional whether or not hprof treats virtual threads in this manner. My point isn't to hash out hprof in this discussion, but just to make sure we differentiate between spec and implementation when it comes to treating virtual threads as roots. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1506465610 From dholmes at openjdk.org Thu Apr 13 07:25:42 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Apr 2023 07:25:42 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v7] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 12 Apr 2023 01:12:49 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected and reported as JVMTI_HEAP_REFERENCE_THREAD; >> - stack references for unmounted VT are reported as JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Fixed indent in collect_vthread_stack_roots I don't see how you can make a distinction between spec and implementation in this particular area - an object is either a GC root or it is not. If it is then that has implications for reachability as well as these traversal API's. Having two different notions of "root" will just lead to confusion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1506477528 From rkennke at openjdk.org Thu Apr 13 07:30:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 13 Apr 2023 07:30:26 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v59] In-Reply-To: References: Message-ID: <1I0RTRTux5ZmqOwFkLGlzB5utmIliV7c4U74daaL9P0=.bc46d7fe-dde9-4698-80ab-a946c750b714@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: A few more LM_ prefixes in 32bit code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/f5451943..db4ca102 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=58 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=57-58 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From kbarrett at openjdk.org Thu Apr 13 08:51:32 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 13 Apr 2023 08:51:32 GMT Subject: RFR: 8305936: JavaThread::create_system_thread_object has unused is_visible argument In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 05:41:31 GMT, David Holmes wrote: > Please review this simple cleanup of an unused parameter in `create_system_thread_object`. Details are in JBS. > > Testing: tiers 1-3 > > Thanks. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13455#pullrequestreview-1382942892 From alanb at openjdk.org Thu Apr 13 09:59:37 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 13 Apr 2023 09:59:37 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v7] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 12 Apr 2023 14:55:59 GMT, Alan Bateman wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed indent in collect_vthread_stack_roots > > In the spec for FollowReferences, it says that the heap roots include "references from thread stacks". There is a similar sentence in the deprecated IterateOverReachableObjects function. We should decide whether these sentences need to be changed to say "platform thread stacks". > @AlanBateman as virtual threads have already been defined to not be GC roots aka "heap roots" then it would seem the spec does need adjusting to say "platform threads". Yes, I think both the sentence in both FollowReferences and IterateOverReachableObjects will need to be re-visited. It is GC specific but historically the JVMTI spec has tried to specify something. Changing it to say platform threads does not prevent an implementation from using the the thread stacks of all threads. Additionally, we might have re-examine the descriptions of both JVMTI_HEAP_REFERENCE_STACK_LOCAL and JVMTI_HEAP_REFERENCE_JNI_LOCAL in jvmtiHeapReferenceKind. Doing the spec changes in a separate PR might be okay, I don't have any opinion on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1506685530 From sspitsyn at openjdk.org Thu Apr 13 10:06:51 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 13 Apr 2023 10:06:51 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 10:43:31 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/prims/jvmtiAgent.cpp line 323: >> >>> 321: assert(agent != nullptr, "invariant"); >>> 322: if (!agent->is_loaded()) { >>> 323: if (!load_agent_from_executable(agent, on_load_symbols, num_symbol_entries)) { >> >> It feels like I'm missing something. >> We already checked and found at line 322 that `agent->is_loaded() == false`. >> Also, we have the comment at line 265: >> >> 265 // If this function returns true, then agent->is_static_lib().&& agent->is_loaded(). >> 266 static bool load_agent_from_executable(Agent* agent, const char* on_load_symbols[], size_t num_symbol_entries) { >> >> As the `agent->is_loaded() == false` then t he condition `agent->is_static_lib() && agent->is_loaded()` has to be `false` and can not be `true`. Then one of the if-checks at lines 322 and 323 is not needed and can be removed. Is it right? Otherwise, the comment at line 265 can be incorrect. > > Good observation, Serguei. > > It is because some paths call into lookup_On_load_Entry_point() twice. > > It is primarily the attempted conversion of xrun agents, the first invocation comes from JvmtiAgent::convert_xrun_agent(). This will have the agent "loaded". If there is an Agent_OnLoad function, the agent is converted (i.e. xrun removed). > > Then when the agent is to invoke the Agent_OnLoad function, there is a second invocation. Here a converted xrun library is already loaded, so I bypass attempting to load it again by checking the is_loaded() property. Thanks. >> src/hotspot/share/prims/jvmtiAgent.cpp line 357: >> >>> 355: vm_exit_during_initialization("Could not find JVM_OnLoad or Agent_OnLoad function in the library", name()); >>> 356: } >>> 357: _xrun = false; // converted >> >> Just questions to understand it better. >> Neither `JVM_Onload` nor `Agent_Onload` entry points are stored after these lookups. It means that in order to be called later (as the comment at line 350 says) they have to be looked up again. >> Is it right? Was it the same originally? > > The entry points are not saved and so have to be looked up again. It was the same originally. > > That is why there is a check and branch on agent->is_loaded(). Thank you. I'm okay with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1165306181 PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1165307170 From sspitsyn at openjdk.org Thu Apr 13 10:10:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 13 Apr 2023 10:10:47 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames What was the reason to clone the classes below ?: `JvmtiJavaThreadEventTransition` => `AgentJavaThreadEventTransition` `JvmtiThreadEventMark` => `AgentThreadEventMark` `JvmtiEventMark` => `AgentEventMark` ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1506701619 From sspitsyn at openjdk.org Thu Apr 13 10:17:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 13 Apr 2023 10:17:44 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames Your fix introduced a hidden dependency of this new structure on the JPLISEnvironment structure and some Java agents implementation details: 202 struct JPLISEnvironmentMirror { 203 jvmtiEnv* mJVMTIEnv; // the JVMTI environment 204 const void* mAgent; // corresponding agent 205 jboolean mIsRetransformer; // indicates if special environment 206 }; It does not look good to me but I can't suggest any other approach at the moment. How important is this part? Have you considered other ways to achieve what is needed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1506711461 From sspitsyn at openjdk.org Thu Apr 13 10:27:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 13 Apr 2023 10:27:48 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 14:39:19 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > renames src/hotspot/share/prims/jvmtiAgentList.cpp line 72: > 70: // there exist an order requirement to iterate oldest -> newest. Our concurrent storage linked-list is newest -> oldest. > 71: // The correct order is preserved by the iterator, by storing a filtered set of entries in a stack. > 72: JvmtiAgentList::Iterator::Iterator(JvmtiAgent** list, Filter filter) : _stack(new GrowableArrayCHeap(16)), _filter(filter) { Nit: It'd be nice to make the lines 69-72 shorter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1165328661 From dholmes at openjdk.org Thu Apr 13 10:42:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Apr 2023 10:42:32 GMT Subject: RFR: 8305936: JavaThread::create_system_thread_object has unused is_visible argument In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 08:48:20 GMT, Kim Barrett wrote: >> Please review this simple cleanup of an unused parameter in `create_system_thread_object`. Details are in JBS. >> >> Testing: tiers 1-3 >> >> Thanks. > > Looks good. Thanks for the review @kimbarrett! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13455#issuecomment-1506740659 From mdoerr at openjdk.org Thu Apr 13 11:00:38 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 13 Apr 2023 11:00:38 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc [v2] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 12:40:32 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - Revert change in file > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - reinrich suggestions > - Replace NULL with nullptr in cpu/ppc LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12323#pullrequestreview-1383163986 From mgronlun at openjdk.org Thu Apr 13 11:48:43 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 13 Apr 2023 11:48:43 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 10:07:50 GMT, Serguei Spitsyn wrote: > What was the reason to clone the classes below ?: `JvmtiJavaThreadEventTransition` => `AgentJavaThreadEventTransition` `JvmtiThreadEventMark` => `AgentThreadEventMark` `JvmtiEventMark` => `AgentEventMark` The reason is they are used when invoking Agent_OnAttach. Those classes are defined in jvmtiExport.cpp, so not reachable. I considered exporting them, but it would require additional headers to be included. I opted for just replicating them, also with static linkage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1506825122 From mgronlun at openjdk.org Thu Apr 13 11:54:45 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 13 Apr 2023 11:54:45 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: <7KrE0kjLkzjP1sEUxDNXAr3r64GUwZzOckaM9QeHhqs=.4232e5a4-a065-41ee-8240-6729b70f21cb@github.com> On Thu, 13 Apr 2023 10:15:02 GMT, Serguei Spitsyn wrote: > Your fix introduced a hidden dependency of this new structure on the JPLISEnvironment structure and some Java agents implementation details: > > ``` > 202 struct JPLISEnvironmentMirror { > 203 jvmtiEnv* mJVMTIEnv; // the JVMTI environment > 204 const void* mAgent; // corresponding agent > 205 jboolean mIsRetransformer; // indicates if special environment > 206 }; > ``` > > It does not look good to me but I can't suggest any other approach at the moment. How important is this part? Have you considered other ways to achieve what is needed? Yes. It is the key to locating which JavaAgent maps to which JvmtiEnv. I tried some other variants, but those would change the layout of the exported structs in jplisAgent.h, and I don't know if people depend on that layout, implicitly or explicitly. So I choose not go down that route. This seemed the best alternative since we own jdk.instrument and the implementation on the JDK side is unlikely to change very much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1506830279 From mgronlun at openjdk.org Thu Apr 13 12:16:16 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 13 Apr 2023 12:16:16 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v17] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initializationTime = 12:31:15.574 (2023-03-08) > initializationDuration = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initializationTime = 12:31:31.037 (2023-03-08) > initializationDuration = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initializationTime = 12:31:36.142 (2023-03-08) > initializationDuration = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: line breaks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/d0fd9e97..85f06038 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=15-16 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Apr 13 12:16:18 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 13 Apr 2023 12:16:18 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v16] In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 10:24:55 GMT, Serguei Spitsyn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> renames > > src/hotspot/share/prims/jvmtiAgentList.cpp line 72: > >> 70: // there exist an order requirement to iterate oldest -> newest. Our concurrent storage linked-list is newest -> oldest. >> 71: // The correct order is preserved by the iterator, by storing a filtered set of entries in a stack. >> 72: JvmtiAgentList::Iterator::Iterator(JvmtiAgent** list, Filter filter) : _stack(new GrowableArrayCHeap(16)), _filter(filter) { > > Nit: It'd be nice to make the lines 69-72 shorter. Ok. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1165429976 From lucy at openjdk.org Thu Apr 13 13:18:38 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 13 Apr 2023 13:18:38 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Wed, 29 Mar 2023 16:56:42 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/stubRoutines_s390.cpp line 58: >> >>> 56: __ z_cgr(table, Z_R0); // safety net >>> 57: __ z_bre(L); >>> 58: __ asm_assert(Assembler::bcondEqual, "crc_table: external word relocation required for load_absolute_address", 0x33); >> >> How should this `asm_assert` ever get hit? It's preceded by a `bre.` > > This was requested change (initially it was not part of this PR). The request was to "move z_illtrap() after asm_assert()". For debug builds, you then get a descriptive assert message. For release builds, you at least have a security net. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1165506536 From pchilanomate at openjdk.org Thu Apr 13 13:42:34 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 13 Apr 2023 13:42:34 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 20:52:25 GMT, Richard Reingruber wrote: > The fix looks good to me. Thanks, Richard. > Thanks for the review Richard! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13446#issuecomment-1506982194 From mdoerr at openjdk.org Thu Apr 13 13:42:38 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 13 Apr 2023 13:42:38 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> On Thu, 13 Apr 2023 13:15:49 GMT, Lutz Schmidt wrote: >> This was requested change (initially it was not part of this PR). > > The request was to "move z_illtrap() after asm_assert()". For debug builds, you then get a descriptive assert message. For release builds, you at least have a security net. `z_bre` + `asm_assert(Assembler::bcondEqual, ...` still doesn't make sense to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1165538813 From lucy at openjdk.org Thu Apr 13 13:49:36 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 13 Apr 2023 13:49:36 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> Message-ID: On Thu, 13 Apr 2023 13:39:48 GMT, Martin Doerr wrote: >> The request was to "move z_illtrap() after asm_assert()". For debug builds, you then get a descriptive assert message. For release builds, you at least have a security net. > > `z_bre` + `asm_assert(Assembler::bcondEqual, ...` still doesn't make sense to me. z_bre is a "branch if equal". Everything is fine if both registers contain the same value. Otherwise, there is something wrong. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1165549006 From mdoerr at openjdk.org Thu Apr 13 14:24:39 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 13 Apr 2023 14:24:39 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> Message-ID: On Thu, 13 Apr 2023 13:47:09 GMT, Lutz Schmidt wrote: >> `z_bre` + `asm_assert(Assembler::bcondEqual, ...` still doesn't make sense to me. > > z_bre is a "branch if equal". Everything is fine if both registers contain the same value. Otherwise, there is something wrong. Seems like you didn't get my point. The `z_bre` is fine, but the assert condition "bcondEqual" can't be true after that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1165604157 From coleen.phillimore at oracle.com Thu Apr 13 14:43:23 2023 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 13 Apr 2023 10:43:23 -0400 Subject: [Investigation] Considering using a hashtable to store the signature handlers In-Reply-To: References: <7d49663e-6a97-c1ff-e41e-cab3c04c3f26@littlepinkcloud.com> Message-ID: This approach seems fine, depending on how useful it is for improving timing. Remember that this hashtable needs a lock for lookup and modification.? This appears to be the case with the existing code. Coleen On 4/12/23 9:02 AM, Guoxiong Li wrote: > On Tue, Apr 11, 2023 at 5:00?PM Andrew Haley > wrote: > > I would measure the time taken for the operations of insertion and > lookup > over a realistic range. > > > Thanks for the test. > > The draft patch [1] may be useful to you. > > [1] https://github.com/lgxbslgx/jdk/tree/SIGNATURE_HANDLERS > > Best Regards, > -- Guoxiong > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucy at openjdk.org Thu Apr 13 15:15:39 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 13 Apr 2023 15:15:39 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> Message-ID: On Thu, 13 Apr 2023 14:22:07 GMT, Martin Doerr wrote: >> z_bre is a "branch if equal". Everything is fine if both registers contain the same value. Otherwise, there is something wrong. > > Seems like you didn't get my point. The `z_bre` is fine, but the assert condition "bcondEqual" can't be true after that. The asm_assert should be unconditional. Arrgh! You are right. Using the inverse condition (bcondNotEqual) is a better choice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1165683269 From mdoerr at openjdk.org Thu Apr 13 16:36:07 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 13 Apr 2023 16:36:07 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> Message-ID: On Thu, 13 Apr 2023 15:12:55 GMT, Lutz Schmidt wrote: >> Seems like you didn't get my point. The `z_bre` is fine, but the assert condition "bcondEqual" can't be true after that. The asm_assert should be unconditional. > > Arrgh! You are right. Using the inverse condition (bcondNotEqual) is a better choice. What about using `stop` or anything which doesn't take a condition? That would avoid confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1165779006 From tsteele at openjdk.org Thu Apr 13 17:40:38 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 13 Apr 2023 17:40:38 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc [v2] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 12:40:32 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - Revert change in file > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - reinrich suggestions > - Replace NULL with nullptr in cpu/ppc Thanks for reaching out. This looks good to me, and builds on AIX. Testing is strange on AIX atm (there are some unrelated failures), so I'm having trouble getting a good clean run. I'm happy to fix any issues that come up later. ------------- Marked as reviewed by tsteele (Committer). PR Review: https://git.openjdk.org/jdk/pull/12323#pullrequestreview-1383913783 From matsaave at openjdk.org Thu Apr 13 17:43:34 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 13 Apr 2023 17:43:34 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v5] In-Reply-To: <3wLQHgnmbwNVuVVZDe1Nlt6V6upyOrUXDxKKkMEOT8Y=.9a9c2b21-e683-4bd0-94ee-8ebcaf1b8333@github.com> References: <3wLQHgnmbwNVuVVZDe1Nlt6V6upyOrUXDxKKkMEOT8Y=.9a9c2b21-e683-4bd0-94ee-8ebcaf1b8333@github.com> Message-ID: On Wed, 12 Apr 2023 17:59:23 GMT, Ioi Lam wrote: >> This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. >> >> **Notes for reviewers:** >> - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. >> - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). >> - It might be easier to see the diff with whitespaces off. >> - There are two major changes in the G1 code >> - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) >> - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) >> - Testing changes: >> - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. >> - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. >> >> **Testing:** >> - Mach5 tiers 1 ~ 7 > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @ashu-mehra comments; some clean up The changes look good to me! ------------- Marked as reviewed by matsaave (Committer). PR Review: https://git.openjdk.org/jdk/pull/13284#pullrequestreview-1383920908 From dholmes at openjdk.org Thu Apr 13 23:13:41 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Apr 2023 23:13:41 GMT Subject: Integrated: 8305936: JavaThread::create_system_thread_object has unused is_visible argument In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 05:41:31 GMT, David Holmes wrote: > Please review this simple cleanup of an unused parameter in `create_system_thread_object`. Details are in JBS. > > Testing: tiers 1-3 > > Thanks. This pull request has now been integrated. Changeset: 8a1639d4 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/8a1639d49b4adc45501fe77cedfef3ca5f42c7f5 Stats: 17 lines in 9 files changed: 0 ins; 2 del; 15 mod 8305936: JavaThread::create_system_thread_object has unused is_visible argument Reviewed-by: alanb, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/13455 From sspitsyn at openjdk.org Fri Apr 14 07:38:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 14 Apr 2023 07:38:47 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v17] In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 12:16:16 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > line breaks Markus, thank you for your answers. I'm okay with it as I can't suggest anything better. These issues are local and spots are pretty small, so it is not that bad and look like reasonable compromise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1508063709 From sspitsyn at openjdk.org Fri Apr 14 07:46:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 14 Apr 2023 07:46:42 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v17] In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 12:16:16 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > line breaks src/hotspot/share/prims/jvmtiExport.cpp line 717: > 715: jvmtiEventVMInit callback = env->callbacks()->VMInit; > 716: if (callback != nullptr) { > 717: JvmtiAgent* const agent = lookup_uninitialized_agent(env, reinterpret_cast(callback)); It was a surprise to me to discover this code in your changes. How can it be possible that some agents are left uninitialized at the time of VMInit event? Have you really observed this? If so, then can you add a comment explaining the need of this initialization? Is it needed for JFR purposes only? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1166421178 From sspitsyn at openjdk.org Fri Apr 14 07:53:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 14 Apr 2023 07:53:49 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v17] In-Reply-To: References: Message-ID: <73FsVYPjYuC09PElaj8hwZRKc55HfqdVKsJqJiQVrBU=.9d7d395e-5618-4148-b20c-04d9e5184c01@github.com> On Thu, 13 Apr 2023 12:16:16 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initializationTime = 12:31:15.574 (2023-03-08) >> initializationDuration = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initializationTime = 12:31:31.037 (2023-03-08) >> initializationDuration = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initializationTime = 12:31:36.142 (2023-03-08) >> initializationDuration = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > line breaks Markus, It looks good to me. Overall, it is a nice consolidation of the agent code, good move in general! Thank you for your patience. I've posted one minor request though. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12923#pullrequestreview-1384843610 From lkorinth at openjdk.org Fri Apr 14 08:53:42 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Fri, 14 Apr 2023 08:53:42 GMT Subject: Integrated: 8305618: Move gcold out of tier1 In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 20:42:35 GMT, Leo Korinth wrote: > 8305618: Move gcold out of tier1 > > Remove gcold out from tier1. Related to [JDK-8298981](https://bugs.openjdk.org/browse/JDK-8298981). Moving gcbasher out of tier1 was more controversial, and will be done later --- if at all. This pull request has now been integrated. Changeset: c0c31224 Author: Leo Korinth URL: https://git.openjdk.org/jdk/commit/c0c31224db205616baadfb89a3fe3259f3cce3f2 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod 8305618: Move gcold out of tier1 Reviewed-by: lmesnik, shade ------------- PR: https://git.openjdk.org/jdk/pull/13340 From jsjolen at openjdk.org Fri Apr 14 08:58:50 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 14 Apr 2023 08:58:50 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc [v2] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 12:40:32 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - Revert change in file > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301495 > - reinrich suggestions > - Replace NULL with nullptr in cpu/ppc Thanks for all of the help with this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12323#issuecomment-1508167859 From jsjolen at openjdk.org Fri Apr 14 08:58:52 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 14 Apr 2023 08:58:52 GMT Subject: Integrated: JDK-8301495: Replace NULL with nullptr in cpu/ppc In-Reply-To: References: Message-ID: <9ZOUnGxEYCgI180GB_gMmeD-vALC_la2KA3Z5CdNJGU=.33533507-be1d-4d1d-add0-9c525ed19fab@github.com> On Tue, 31 Jan 2023 11:39:48 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 0826ceee Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/0826ceee65ab83f643a77716f8f12d0060369923 Stats: 381 lines in 51 files changed: 0 ins; 0 del; 381 mod 8301495: Replace NULL with nullptr in cpu/ppc Reviewed-by: rrich, mdoerr, tsteele ------------- PR: https://git.openjdk.org/jdk/pull/12323 From mgronlun at openjdk.org Fri Apr 14 09:24:44 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 14 Apr 2023 09:24:44 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v17] In-Reply-To: References: Message-ID: <0gj2kjWDMnEpt_58UH60JCzURC9ZuOowVLom48A69n0=.0c35359f-514a-4526-95ae-2a256d9372d7@github.com> On Fri, 14 Apr 2023 07:43:23 GMT, Serguei Spitsyn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> line breaks > > src/hotspot/share/prims/jvmtiExport.cpp line 717: > >> 715: jvmtiEventVMInit callback = env->callbacks()->VMInit; >> 716: if (callback != nullptr) { >> 717: JvmtiAgent* const agent = lookup_uninitialized_agent(env, reinterpret_cast(callback)); > > It was a surprise to me to discover this code in your changes. > How can it be possible that some agents are left uninitialized at the time of VMInit event? > Have you really observed this? > If so, then can you add a comment explaining the need of this initialization? > Is it needed for JFR purposes only? Ok. the terminology here might be confusing. The concept of an agent being "initialized" is introduced and reported by JFR. For example, here is the JFR event type definition for a NativeAgent: ``` As you can see, there are two fields: initializationTime and IntializationDuration. We report these to let users understand when an agent was initialized (VMInit or Agent_OnAttach), together with the duration it took to execute either. For JavaAgents, it measures the invocation and duration of the premain or agentmain methods. "Initialized" does not mean "loaded" (at this point, all agents are loaded), but rather it means the agent has received a timestamp set as a function of VMInit. This timestamp and duration are what we will report in JFR as part of the event. An "uninitialized" agent is an agent who has not yet been timestamped, as part of VMInit, for example. Since an agent can create multiple JvmtiEnvs, the function is called lookup_unitialized_agent(), because we can only have a single timestamp for an agent, but it can, in turn, have multiple JvmtiEnvs. When looking up an agent again, using a second JvmtiEnv created by it, the agent is already "initialized", so no agent is returned. We cannot have the timestamping logic as part of the call out to Agent_OnLoad, because that call happens very early during VM bootstrap, so the Ticks support structures are not yet in place. But, timing the Agent_OnLoad call would be rather meaningless because the agent cannot do much except construct a JvmtiEnv and setting capabilities and callbacks. VMInit is where most of the invocation logic, at least for JavaAgents happens, so the measurements are placed there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1166568901 From jsjolen at openjdk.org Fri Apr 14 09:57:03 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 14 Apr 2023 09:57:03 GMT Subject: RFR: JDK-8301496: Replace NULL with nullptr in cpu/riscv [v3] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 09:49:03 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/riscv. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301496 > - Replace NULL with nullptr in cpu/riscv Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12324#issuecomment-1508253428 From jsjolen at openjdk.org Fri Apr 14 09:57:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 14 Apr 2023 09:57:05 GMT Subject: Integrated: JDK-8301496: Replace NULL with nullptr in cpu/riscv In-Reply-To: References: Message-ID: <3BGjp1geCEmIki9SsPWJcW-1zvrIPOlUR-CYm68JwN4=.40b53e19-a68e-461d-a43b-cc1a57b52674@github.com> On Tue, 31 Jan 2023 11:39:59 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/riscv. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: d2ce04bb Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/d2ce04bb101002abfdb7c8adb3fa8ea267903c36 Stats: 573 lines in 45 files changed: 0 ins; 0 del; 573 mod 8301496: Replace NULL with nullptr in cpu/riscv Reviewed-by: dholmes, fyang ------------- PR: https://git.openjdk.org/jdk/pull/12324 From rkennke at openjdk.org Fri Apr 14 11:19:32 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 14 Apr 2023 11:19:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v60] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 156 commits: - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 - A few more LM_ prefixes in 32bit code - Replace UseHeavyMonitor with LockingMode == LM_MONITOR - Prefix LockingMode constants with LM_* - Bunch of comments and typos - Don't use NativeAccess in LockStack::contains() - RISCV update - Put back thread type check in OS::is_lock_owned() - Named constants for LockingMode - Address David's review comments - ... and 146 more: https://git.openjdk.org/jdk/compare/d2ce04bb...d0a448c6 ------------- Changes: https://git.openjdk.org/jdk/pull/10907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=59 Stats: 2505 lines in 69 files changed: 1682 ins; 109 del; 714 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From duke at openjdk.org Fri Apr 14 13:52:29 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 14 Apr 2023 13:52:29 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call Message-ID: On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. ------------- Commit messages: - 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call Changes: https://git.openjdk.org/jdk/pull/13477/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13477&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8300197 Stats: 63 lines in 5 files changed: 4 ins; 24 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/13477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13477/head:pull/13477 PR: https://git.openjdk.org/jdk/pull/13477 From mgronlun at openjdk.org Fri Apr 14 14:44:43 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 14 Apr 2023 14:44:43 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v17] In-Reply-To: <73FsVYPjYuC09PElaj8hwZRKc55HfqdVKsJqJiQVrBU=.9d7d395e-5618-4148-b20c-04d9e5184c01@github.com> References: <73FsVYPjYuC09PElaj8hwZRKc55HfqdVKsJqJiQVrBU=.9d7d395e-5618-4148-b20c-04d9e5184c01@github.com> Message-ID: On Fri, 14 Apr 2023 07:50:39 GMT, Serguei Spitsyn wrote: > Markus, It looks good to me. Overall, it is a nice consolidation of the agent code, good move in general! Thank you for your patience. I've posted one minor request though. Thanks, Serguei Thanks for taking a look Serguei, appreciate it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1508680453 From rehn at openjdk.org Fri Apr 14 18:38:33 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Apr 2023 18:38:33 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 15:48:53 GMT, Patricio Chilano Mateo wrote: > Please review this fix. The check to skip walking stacks of virtual threads will not identify a thread in a transition since it relies on the jvmti_vthread() which would have already changed at the very beginning of it. The crash happens because the anchor might have changed between walking the stack of the thread in a transition and executing the deopt handshake for a particular frame. The frame is never found and looping executing fr.sender() crashes. This scenario can happen if the initial EscapeBarrierSuspendHandshake executed to synchronize against all threads finds the thread blocked in the stackchunk allocation path. Because the thread will actually block on the next transition to Java, and not on a blocked->vm transition, it will continue executing and change its anchor while the requester is walking its stack. There are more details in the bug comments. > The fix modifies the conditional to check if the continuation is mounted or not. This will identify the transition case too and won't face the anchor change issue since the continuation entry will be removed after returning from the freeze call. > The fix was tested against a reproducer which I attached to the bug. > > Thanks, > Patricio Thanks, seems good! But question, I believe the query is wrong. We actually don't care if it's a virtual thread or if it is a plain continuation, right? It's a bit wired since some of the code is 'prepared' for plain continuations, while some is not. The query does: const ContinuationEntry* JavaThread::vthread_continuation() const { for (ContinuationEntry* c = last_continuation(); c != nullptr; c = c->parent()) { if (c->is_virtual_thread()) return c; } return nullptr; } But if we had a plain continuation, the same bug would happen AFAICT. So I would like the question to be jt->have_continuation_mounted(). ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13446#pullrequestreview-1386022701 From pchilanomate at openjdk.org Fri Apr 14 19:05:31 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 14 Apr 2023 19:05:31 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 18:36:10 GMT, Robbin Ehn wrote: > Thanks, seems good! > > But question, I believe the query is wrong. We actually don't care if it's a virtual thread or if it is a plain continuation, right? It's a bit wired since some of the code is 'prepared' for plain continuations, while some is not. > > The query does: > > ``` > const ContinuationEntry* JavaThread::vthread_continuation() const { > for (ContinuationEntry* c = last_continuation(); c != nullptr; c = c->parent()) { > if (c->is_virtual_thread()) > return c; > } > return nullptr; > } > ``` > > But if we had a plain continuation, the same bug would happen AFAICT. > > So I would like the question to be jt->have_continuation_mounted(). > Yes, this code would have the same issue with plain continuations. It's just that we are never supposed to encounter one today since they are only used by virtual threads. But I can change the check and use jt->last_continuation() which is already there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13446#issuecomment-1509094961 From rehn at openjdk.org Fri Apr 14 19:34:39 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Apr 2023 19:34:39 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) [v2] In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 19:30:29 GMT, Patricio Chilano Mateo wrote: >> Please review this fix. The check to skip walking stacks of virtual threads will not identify a thread in a transition since it relies on the jvmti_vthread() which would have already changed at the very beginning of it. The crash happens because the anchor might have changed between walking the stack of the thread in a transition and executing the deopt handshake for a particular frame. The frame is never found and looping executing fr.sender() crashes. This scenario can happen if the initial EscapeBarrierSuspendHandshake executed to synchronize against all threads finds the thread blocked in the stackchunk allocation path. Because the thread will actually block on the next transition to Java, and not on a blocked->vm transition, it will continue executing and change its anchor while the requester is walking its stack. There are more details in the bug comments. >> The fix modifies the conditional to check if the continuation is mounted or not. This will identify the transition case too and won't face the anchor change issue since the continuation entry will be removed after returning from the freeze call. >> The fix was tested against a reproducer which I attached to the bug. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add explicit nullptr check > - modify check to include plain continuations Marked as reviewed by rehn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13446#pullrequestreview-1386083282 From pchilanomate at openjdk.org Fri Apr 14 19:34:41 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 14 Apr 2023 19:34:41 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) In-Reply-To: References: Message-ID: <3BJXKwBfTO5EgSg-AbrFrurjjfdbSVsRZfKF534oKbQ=.f23562c9-5fdf-4b0e-a9cb-d3a97507b87b@github.com> On Wed, 12 Apr 2023 15:48:53 GMT, Patricio Chilano Mateo wrote: > Please review this fix. The check to skip walking stacks of virtual threads will not identify a thread in a transition since it relies on the jvmti_vthread() which would have already changed at the very beginning of it. The crash happens because the anchor might have changed between walking the stack of the thread in a transition and executing the deopt handshake for a particular frame. The frame is never found and looping executing fr.sender() crashes. This scenario can happen if the initial EscapeBarrierSuspendHandshake executed to synchronize against all threads finds the thread blocked in the stackchunk allocation path. Because the thread will actually block on the next transition to Java, and not on a blocked->vm transition, it will continue executing and change its anchor while the requester is walking its stack. There are more details in the bug comments. > The fix modifies the conditional to check if the continuation is mounted or not. This will identify the transition case too and won't face the anchor change issue since the continuation entry will be removed after returning from the freeze call. > The fix was tested against a reproducer which I attached to the bug. > > Thanks, > Patricio Thanks for the review Robbin! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13446#issuecomment-1509122278 From pchilanomate at openjdk.org Fri Apr 14 19:34:39 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 14 Apr 2023 19:34:39 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) [v2] In-Reply-To: References: Message-ID: > Please review this fix. The check to skip walking stacks of virtual threads will not identify a thread in a transition since it relies on the jvmti_vthread() which would have already changed at the very beginning of it. The crash happens because the anchor might have changed between walking the stack of the thread in a transition and executing the deopt handshake for a particular frame. The frame is never found and looping executing fr.sender() crashes. This scenario can happen if the initial EscapeBarrierSuspendHandshake executed to synchronize against all threads finds the thread blocked in the stackchunk allocation path. Because the thread will actually block on the next transition to Java, and not on a blocked->vm transition, it will continue executing and change its anchor while the requester is walking its stack. There are more details in the bug comments. > The fix modifies the conditional to check if the continuation is mounted or not. This will identify the transition case too and won't face the anchor change issue since the continuation entry will be removed after returning from the freeze call. > The fix was tested against a reproducer which I attached to the bug. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - add explicit nullptr check - modify check to include plain continuations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13446/files - new: https://git.openjdk.org/jdk/pull/13446/files/9a52655d..70359de6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13446&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13446&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13446.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13446/head:pull/13446 PR: https://git.openjdk.org/jdk/pull/13446 From cslucas at openjdk.org Fri Apr 14 20:54:45 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 14 Apr 2023 20:54:45 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v8] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges that are used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straight forward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also tested with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR review 3. Some comments and be able to abort compilation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/8ed147f4..a10b0a4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=06-07 Stats: 118 lines in 13 files changed: 60 ins; 11 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From cslucas at openjdk.org Fri Apr 14 20:54:48 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 14 Apr 2023 20:54:48 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v5] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 31 Mar 2023 18:30:19 GMT, Xin Liu wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > > src/hotspot/share/opto/escape.cpp line 457: > >> 455: found_sr_allocate = true; >> 456: } else { >> 457: ptn->set_scalar_replaceable(false); > > This member function is const. Do we really need to change ptn's property here? > > My reading is ophi is profitable as long as we spot any input object which can be eliminated. how about you just return at line 455? This is actually necessary here. By setting the input to NSR I don't need to later, when performing reduction, check that I can eliminate the node. I can just check that I can scalar replace the input. If I removed this line I'd hit a problem if the merge had an input that is SR but that ME can't eliminate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1167263888 From cslucas at openjdk.org Fri Apr 14 20:56:39 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 14 Apr 2023 20:56:39 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v4] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <6NDwZSpjSrokmglncPRp4tM7_Hiq4b26dXukhXODpKo=.8ba7efd0-bc44-4f1e-beb8-c1c68bc33515@github.com> Message-ID: On Fri, 24 Mar 2023 16:40:15 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Add support for SR'ing some inputs of merges used for field loads > > You new test failed in GHA testing with 32-bit VM: `Could not find VM flag "UseCompressedOops" in @IR rule 1 at int`. > You need to adjust next rule: `@IR(counts = { IRNode.ALLOC, "2" }, applyIf = { "UseCompressedOops", "false" })` @vnkozlov - I think I addressed all your comments. Please let me know if I missed something or if there is something on that you think need to be improved. @iwanowww - can I ask you to please take a look and let me know what you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1509247613 From kvn at openjdk.org Fri Apr 14 21:49:36 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 14 Apr 2023 21:49:36 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v8] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <1pTfg6PGb3zu3ndvKYt0FSFmkOA01w9qLFtQ_s1BQbE=.7de234bc-5484-4d98-a003-ff86836922b9@github.com> On Fri, 14 Apr 2023 20:54:45 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges that are used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straight forward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also tested with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR review 3. Some comments and be able to abort compilation. Nice. I will test it. ------------- PR Review: https://git.openjdk.org/jdk/pull/12897#pullrequestreview-1386210380 From sspitsyn at openjdk.org Fri Apr 14 22:07:27 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 14 Apr 2023 22:07:27 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions Message-ID: This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. Testing: mach5 tiers 1-6 were successful. ------------- Commit messages: - 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions Changes: https://git.openjdk.org/jdk/pull/13484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306028 Stats: 161 lines in 4 files changed: 86 ins; 61 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13484/head:pull/13484 PR: https://git.openjdk.org/jdk/pull/13484 From sspitsyn at openjdk.org Fri Apr 14 22:24:39 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 14 Apr 2023 22:24:39 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v17] In-Reply-To: <0gj2kjWDMnEpt_58UH60JCzURC9ZuOowVLom48A69n0=.0c35359f-514a-4526-95ae-2a256d9372d7@github.com> References: <0gj2kjWDMnEpt_58UH60JCzURC9ZuOowVLom48A69n0=.0c35359f-514a-4526-95ae-2a256d9372d7@github.com> Message-ID: On Fri, 14 Apr 2023 09:20:18 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/prims/jvmtiExport.cpp line 717: >> >>> 715: jvmtiEventVMInit callback = env->callbacks()->VMInit; >>> 716: if (callback != nullptr) { >>> 717: JvmtiAgent* const agent = lookup_uninitialized_agent(env, reinterpret_cast(callback)); >> >> It was a surprise to me to discover this code in your changes. >> How can it be possible that some agents are left uninitialized at the time of VMInit event? >> Have you really observed this? >> If so, then can you add a comment explaining the need of this initialization? >> Is it needed for JFR purposes only? > > Ok. the terminology here might be confusing. The concept of an agent being "initialized" is introduced and reported by JFR. For example, here is the JFR event type definition for a NativeAgent: > ``` > thread="false" startTime="false" period="endChunk" stackTrace="false"> > > > > > > > > > As you can see, there are two fields: initializationTime and IntializationDuration. > > We report these to let users understand when an agent was initialized (VMInit or Agent_OnAttach), together with the duration it took to execute either. For JavaAgents, it measures the invocation and duration of the premain or agentmain methods. > > "Initialized" does not mean "loaded" (at this point, all agents are loaded), but rather it means the agent has received a timestamp set as a function of VMInit. This timestamp and duration are what we will report in JFR as part of the event. > > An "uninitialized" agent is an agent who has not yet been timestamped, as part of VMInit, for example. Since an agent can create multiple JvmtiEnvs, the function is called lookup_uninitialized_agent() because we can only have a single timestamp for an agent, but it can, in turn, have multiple JvmtiEnvs. When looking up an agent again, using a second JvmtiEnv created by it, the agent is already "initialized", so no agent is returned. > > We cannot have the timestamping logic as part of the call out to Agent_OnLoad, because that call happens very early during VM bootstrap, so the Ticks support structures are not yet in place. But, timing the Agent_OnLoad call would be rather meaningless because the agent cannot do much except construct a JvmtiEnv and setting capabilities and callbacks. > > VMInit is where most of the invocation logic, at least for JavaAgents happens, so the measurements are placed there. Thank you for explaining it. Could you, please, add small comment explaining that it is for JFR purposes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1167310779 From kvn at openjdk.org Sat Apr 15 00:20:36 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 15 Apr 2023 00:20:36 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v8] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 14 Apr 2023 20:54:45 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges that are used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straight forward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also tested with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR review 3. Some comments and be able to abort compilation. New test failed in tier1 on all platforms. Here is list: 1) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.TestTrapAfterMerge(boolean,int,int)" - [Failed IR rules: 1]: 2) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testCondLoadAfterMerge(boolean,boolean,int,int)" - [Failed IR rules: 1]: 3) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testLoadInCondAfterMerge(boolean,int,int)" - [Failed IR rules: 1]: 4) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testMergesAndMixedEscape(boolean,int,int)" - [Failed IR rules: 1]: 5) Method "compiler.c2.irTests.scalarReplacement.AllocationMergesTests$Point[] compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testNestedObjectsArray(boolean,int,int)" - [Failed IR rules: 1]: 6) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testNestedObjectsNoEscapeObject(boolean,int,int)" - [Failed IR rules: 1]: 7) Method "compiler.c2.irTests.scalarReplacement.AllocationMergesTests$Point compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testNestedObjectsObject(boolean,int,int)" - [Failed IR rules: 1]: ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1509415967 From sspitsyn at openjdk.org Sat Apr 15 00:23:30 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 15 Apr 2023 00:23:30 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v2] In-Reply-To: References: Message-ID: > This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. > > Testing: mach5 tiers 1-6 were successful. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: 8304444: Reappearance of NULL in jvmtiThreadState.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13484/files - new: https://git.openjdk.org/jdk/pull/13484/files/1f042a16..7735ffac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13484/head:pull/13484 PR: https://git.openjdk.org/jdk/pull/13484 From duke at openjdk.org Sat Apr 15 01:47:50 2023 From: duke at openjdk.org (duke) Date: Sat, 15 Apr 2023 01:47:50 GMT Subject: Withdrawn: 8299915: Remove ArrayAllocatorMallocLimit and associated code In-Reply-To: References: Message-ID: On Tue, 10 Jan 2023 20:55:12 GMT, Justin King wrote: > Remove abstraction that is a holdover from Solaris. Direct usages of `MmapArrayAllocator` have been switched to normal `malloc`. The justification is that none of the code paths are called from signal handlers, so using `mmap` directly does not make sense and is potentially slower than going through `malloc` which can potentially re-use memory without making any system calls. The remaining usages of `ArrayAllocator` and `MallocArrayAllocator` are equivalent. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/11931 From duke at openjdk.org Sat Apr 15 02:21:53 2023 From: duke at openjdk.org (duke) Date: Sat, 15 Apr 2023 02:21:53 GMT Subject: Withdrawn: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 11 Oct 2022 16:02:41 GMT, Andrew Haley wrote: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From kbarrett at openjdk.org Sat Apr 15 04:32:38 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 15 Apr 2023 04:32:38 GMT Subject: RFR: 8306029: ProblemList runtime/ErrorHandling/TestDwarf.java on linux Message-ID: A trivial fix to change problem-listing of runtime/ErrorHandling/TestDwarf.java from linux-i586 to linux-all, since there's no reason to think it's limited to just 32bit platforms. ------------- Commit messages: - expand problem listing to linux-all Changes: https://git.openjdk.org/jdk/pull/13485/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13485&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306029 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13485.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13485/head:pull/13485 PR: https://git.openjdk.org/jdk/pull/13485 From stuefe at openjdk.org Sat Apr 15 05:06:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 15 Apr 2023 05:06:40 GMT Subject: RFR: 8306029: ProblemList runtime/ErrorHandling/TestDwarf.java on linux In-Reply-To: References: Message-ID: On Sat, 15 Apr 2023 00:23:21 GMT, Kim Barrett wrote: > A trivial fix to change problem-listing of runtime/ErrorHandling/TestDwarf.java > from linux-i586 to linux-all, since there's no reason to think it's limited to > just 32bit platforms. ok and trivial ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13485#pullrequestreview-1386312144 From kbarrett at openjdk.org Sat Apr 15 05:18:42 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 15 Apr 2023 05:18:42 GMT Subject: Integrated: 8306029: ProblemList runtime/ErrorHandling/TestDwarf.java on linux In-Reply-To: References: Message-ID: On Sat, 15 Apr 2023 00:23:21 GMT, Kim Barrett wrote: > A trivial fix to change problem-listing of runtime/ErrorHandling/TestDwarf.java > from linux-i586 to linux-all, since there's no reason to think it's limited to > just 32bit platforms. This pull request has now been integrated. Changeset: caa841d9 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/caa841d9a52352a975394e5506fbc56563df9321 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8306029: ProblemList runtime/ErrorHandling/TestDwarf.java on linux Reviewed-by: stuefe ------------- PR: https://git.openjdk.org/jdk/pull/13485 From kbarrett at openjdk.org Sat Apr 15 05:18:41 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 15 Apr 2023 05:18:41 GMT Subject: RFR: 8306029: ProblemList runtime/ErrorHandling/TestDwarf.java on linux In-Reply-To: References: Message-ID: On Sat, 15 Apr 2023 05:03:21 GMT, Thomas Stuefe wrote: >> A trivial fix to change problem-listing of runtime/ErrorHandling/TestDwarf.java >> from linux-i586 to linux-all, since there's no reason to think it's limited to >> just 32bit platforms. > > ok and trivial Thanks @tstuefe ------------- PR Comment: https://git.openjdk.org/jdk/pull/13485#issuecomment-1509514621 From rrich at openjdk.org Sat Apr 15 06:08:33 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 15 Apr 2023 06:08:33 GMT Subject: RFR: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) [v2] In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 19:34:39 GMT, Patricio Chilano Mateo wrote: >> Please review this fix. The check to skip walking stacks of virtual threads will not identify a thread in a transition since it relies on the jvmti_vthread() which would have already changed at the very beginning of it. The crash happens because the anchor might have changed between walking the stack of the thread in a transition and executing the deopt handshake for a particular frame. The frame is never found and looping executing fr.sender() crashes. This scenario can happen if the initial EscapeBarrierSuspendHandshake executed to synchronize against all threads finds the thread blocked in the stackchunk allocation path. Because the thread will actually block on the next transition to Java, and not on a blocked->vm transition, it will continue executing and change its anchor while the requester is walking its stack. There are more details in the bug comments. >> The fix modifies the conditional to check if the continuation is mounted or not. This will identify the transition case too and won't face the anchor change issue since the continuation entry will be removed after returning from the freeze call. >> The fix was tested against a reproducer which I attached to the bug. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add explicit nullptr check > - modify check to include plain continuations Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13446#pullrequestreview-1386321972 From dholmes at openjdk.org Mon Apr 17 04:47:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Apr 2023 04:47:35 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v4] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 08:55:07 GMT, Matthias Baesken wrote: >> The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. >> However at some places INCLUDE_CDS is missing and should be added. >> >> One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust arguments handling I'm a bit confused about the testing situation as you indicated `-Xshare:off` causes an error on AIX without CDS, yet Ioi showed it doesn't cause an error when used with the MinimalVM which also doesn't have CDS. I think Ioi's earlier suggestion regarding handling of the different `-Xshare:` flags is worth looking at from the testing perspective (only `on` should cause an error on systems without CDS). But that would be a separate issue. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12691#issuecomment-1510688864 From fyang at openjdk.org Mon Apr 17 06:52:33 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Apr 2023 06:52:33 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: References: Message-ID: <45-ONQI9dSdGx8chKpms07dh_TfXJUWWCAsYiAd32rM=.cd749431-c12e-42ab-a607-e4ca60bce69b@github.com> On Fri, 14 Apr 2023 13:45:12 GMT, Fredrik Bredberg wrote: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2160: > 2158: copy_from_chunk(heap_frame_top, stack_frame_top, fsize); > 2159: > 2160: set_interpreter_frame_bottom(f, stack_frame_bottom); // the copy overwrites the metadata Since ThawBase::set_interpreter_frame_bottom has nothing to do after this change, I think it might be cleaner to remove this function at the same time? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168251005 From mgronlun at openjdk.org Mon Apr 17 09:17:45 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 17 Apr 2023 09:17:45 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v17] In-Reply-To: References: <0gj2kjWDMnEpt_58UH60JCzURC9ZuOowVLom48A69n0=.0c35359f-514a-4526-95ae-2a256d9372d7@github.com> Message-ID: On Fri, 14 Apr 2023 22:22:13 GMT, Serguei Spitsyn wrote: >> Ok. the terminology here might be confusing. The concept of an agent being "initialized" is introduced and reported by JFR. For example, here is the JFR event type definition for a NativeAgent: >> ``` >> > thread="false" startTime="false" period="endChunk" stackTrace="false"> >> >> >> >> >> >> >> >> >> As you can see, there are two fields: initializationTime and IntializationDuration. >> >> We report these to let users understand when an agent was initialized (VMInit or Agent_OnAttach), together with the duration it took to execute either. For JavaAgents, it measures the invocation and duration of the premain or agentmain methods. >> >> "Initialized" does not mean "loaded" (at this point, all agents are loaded), but rather it means the agent has received a timestamp set as a function of VMInit. This timestamp and duration are what we will report in JFR as part of the event. >> >> An "uninitialized" agent is an agent who has not yet been timestamped, as part of VMInit, for example. Since an agent can create multiple JvmtiEnvs, the function is called lookup_uninitialized_agent() because we can only have a single timestamp for an agent, but it can, in turn, have multiple JvmtiEnvs. When looking up an agent again, using a second JvmtiEnv created by it, the agent is already "initialized", so no agent is returned. >> >> We cannot have the timestamping logic as part of the call out to Agent_OnLoad, because that call happens very early during VM bootstrap, so the Ticks support structures are not yet in place. But, timing the Agent_OnLoad call would be rather meaningless because the agent cannot do much except construct a JvmtiEnv and setting capabilities and callbacks. >> >> VMInit is where most of the invocation logic, at least for JavaAgents happens, so the measurements are placed there. > > Thank you for explaining it. > Could you, please, add small comment explaining that it is for JFR purposes? Will do. Thank you, Serguei. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1168407443 From mgronlun at openjdk.org Mon Apr 17 09:36:56 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 17 Apr 2023 09:36:56 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v18] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initializationTime = 12:31:15.574 (2023-03-08) > initializationDuration = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initializationTime = 12:31:31.037 (2023-03-08) > initializationDuration = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initializationTime = 12:31:36.142 (2023-03-08) > initializationDuration = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: - capital letter - explanatory comment in jvmtiExport.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/85f06038..140079c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=16-17 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Mon Apr 17 09:37:44 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 17 Apr 2023 09:37:44 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 00:23:39 GMT, David Holmes wrote: >> No need to load any JFR classes. No change to startup logic. > >> No need to load any JFR classes. > > I thought JFR was all Java-based these days. But if no Java involved then that is good. > >> No change to startup logic. > > I flagged a change in my comment above. Thank you @dholmes-ora, @sspitsyn and @adinn for your reviews and sticking with this one for quite some time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12923#issuecomment-1511011114 From mgronlun at openjdk.org Mon Apr 17 10:29:01 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 17 Apr 2023 10:29:01 GMT Subject: Integrated: 8257967: JFR: Events for loaded agents In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 12:41:15 GMT, Markus Gr?nlund wrote: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initializationTime = 12:31:15.574 (2023-03-08) > initializationDuration = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initializationTime = 12:31:31.037 (2023-03-08) > initializationDuration = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initializationTime" is the timestamp the JVM invoked the initialization method, and "initializationDuration" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initializationTime" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initializationTime = 12:31:36.142 (2023-03-08) > initializationDuration = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initializationTime" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationDuration" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationDuration" will be 0. If the agent is loaded dynamically, "initializationDuration" is the time taken to execute the Agent_OnAttach callback. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus This pull request has now been integrated. Changeset: 5c95bb1c Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/5c95bb1c5146e13dd213d5ca6e02e2a02ca0323e Stats: 1893 lines in 23 files changed: 1375 ins; 487 del; 31 mod 8257967: JFR: Events for loaded agents Reviewed-by: dholmes, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12923 From rrich at openjdk.org Mon Apr 17 10:33:39 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Apr 2023 10:33:39 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 13:45:12 GMT, Fredrik Bredberg wrote: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. I've tested jdk_loom on ppc64le successfully. Thanks, Richard. src/hotspot/cpu/ppc/continuationFreezeThaw_ppc.inline.hpp line 267: > 265: intptr_t *sp, *fp; > 266: if (FKind::interpreted) { > 267: intptr_t offset = *f.addr_at(ijava_idx(locals)); I'd prefer a more specific name. Maybe `locals_offset` or `local0_offset`? src/hotspot/cpu/ppc/continuationFreezeThaw_ppc.inline.hpp line 510: > 508: // we need to set the locals so that the caller of new_stack_frame() can call > 509: // ContinuationHelper::InterpretedFrame::frame_bottom > 510: // copy relativized locals from the heap frame Maybe reduce the comment? // we need to copy the locals so that the caller of new_stack_frame() can call // ContinuationHelper::InterpretedFrame::frame_bottom src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1092: > 1090: assert(heap_frame_bottom == heap_frame_top + fsize, ""); > 1091: > 1092: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. Typo Suggestion: // Some architectures (like AArch64/PPC64/RISC-V) add padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1093: > 1091: > 1092: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. > 1093: // On those architectures we freeze the padding in order to keep the same localized pointer values. Suggestion: // On those architectures we freeze the padding in order to keep the same relative references. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2156: > 2154: assert(!f.is_heap_frame(), "should not be"); > 2155: > 2156: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. Suggestion: // Some architectures (like AArch64/PPC64/RISC-V) add padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. ------------- PR Review: https://git.openjdk.org/jdk/pull/13477#pullrequestreview-1387695274 PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168464062 PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168459316 PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168473037 PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168479719 PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168480534 From duke at openjdk.org Mon Apr 17 14:14:43 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Mon, 17 Apr 2023 14:14:43 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: <45-ONQI9dSdGx8chKpms07dh_TfXJUWWCAsYiAd32rM=.cd749431-c12e-42ab-a607-e4ca60bce69b@github.com> References: <45-ONQI9dSdGx8chKpms07dh_TfXJUWWCAsYiAd32rM=.cd749431-c12e-42ab-a607-e4ca60bce69b@github.com> Message-ID: <8gSnJ7pOgAbvd6pjxrA3DBtnJ4tC-G6LbHGClDmEKhU=.8ebbe4e2-784f-41a5-8d9f-13bc65de0eee@github.com> On Mon, 17 Apr 2023 06:49:37 GMT, Fei Yang wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2160: > >> 2158: copy_from_chunk(heap_frame_top, stack_frame_top, fsize); >> 2159: >> 2160: set_interpreter_frame_bottom(f, stack_frame_bottom); // the copy overwrites the metadata > > Since ThawBase::set_interpreter_frame_bottom has nothing to do after this change, I think it might be cleaner to remove this function at the same time? I see what you mean, but I chose to keep it because of the assert() in ThawBase::set_interpreter_frame_bottom. After all, it was this assert that alerted me to the JDK-8305247 bug. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168764973 From pchilanomate at openjdk.org Mon Apr 17 14:43:45 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 17 Apr 2023 14:43:45 GMT Subject: Integrated: 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 15:48:53 GMT, Patricio Chilano Mateo wrote: > Please review this fix. The check to skip walking stacks of virtual threads will not identify a thread in a transition since it relies on the jvmti_vthread() which would have already changed at the very beginning of it. The crash happens because the anchor might have changed between walking the stack of the thread in a transition and executing the deopt handshake for a particular frame. The frame is never found and looping executing fr.sender() crashes. This scenario can happen if the initial EscapeBarrierSuspendHandshake executed to synchronize against all threads finds the thread blocked in the stackchunk allocation path. Because the thread will actually block on the next transition to Java, and not on a blocked->vm transition, it will continue executing and change its anchor while the requester is walking its stack. There are more details in the bug comments. > The fix modifies the conditional to check if the continuation is mounted or not. This will identify the transition case too and won't face the anchor change issue since the continuation entry will be removed after returning from the freeze call. > The fix was tested against a reproducer which I attached to the bug. > > Thanks, > Patricio This pull request has now been integrated. Changeset: 73609604 Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/7360960454b3116a0724396f25415f2c3bcf8930 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8305625: Stress test crashes with SEGV in Deoptimization::deoptimize_frame_internal(JavaThread*, long*, Deoptimization::DeoptReason) Reviewed-by: rrich, rehn ------------- PR: https://git.openjdk.org/jdk/pull/13446 From rrich at openjdk.org Mon Apr 17 14:47:33 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Apr 2023 14:47:33 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: References: Message-ID: <4GITjfS1eT5rMwMKT6-ZdSEN6KXZQms8CaKSfK_rAoE=.211c7cd5-bb14-4bfa-b5a0-23ead8251206@github.com> On Fri, 14 Apr 2023 13:45:12 GMT, Fredrik Bredberg wrote: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2157: > 2155: > 2156: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. > 2157: // On those architectures we thaw the padding in order to keep the same localized pointer values. Somehow this suggestion was not delivered... Suggestion: // On those architectures we thaw the padding in order to keep the same relative references. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168830476 From rrich at openjdk.org Mon Apr 17 14:47:36 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Apr 2023 14:47:36 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: <8gSnJ7pOgAbvd6pjxrA3DBtnJ4tC-G6LbHGClDmEKhU=.8ebbe4e2-784f-41a5-8d9f-13bc65de0eee@github.com> References: <45-ONQI9dSdGx8chKpms07dh_TfXJUWWCAsYiAd32rM=.cd749431-c12e-42ab-a607-e4ca60bce69b@github.com> <8gSnJ7pOgAbvd6pjxrA3DBtnJ4tC-G6LbHGClDmEKhU=.8ebbe4e2-784f-41a5-8d9f-13bc65de0eee@github.com> Message-ID: <5Lh1b1RiwJy5YfgQcQhmNy-ufET1y_gxidS-ldEMFh4=.abbcb2f8-0022-4334-a21f-f70c7f4bad92@github.com> On Mon, 17 Apr 2023 14:12:02 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2160: >> >>> 2158: copy_from_chunk(heap_frame_top, stack_frame_top, fsize); >>> 2159: >>> 2160: set_interpreter_frame_bottom(f, stack_frame_bottom); // the copy overwrites the metadata >> >> Since ThawBase::set_interpreter_frame_bottom has nothing to do after this change, I think it might be cleaner to remove this function at the same time? > > I see what you mean, but I chose to keep it because of the assert() in ThawBase::set_interpreter_frame_bottom. > After all, it was this assert that alerted me to the JDK-8305247 bug. Is it possible to get an equivalent but platform independent version of the assertion? Something like `assert(f.interpreter_frame_local_at(0) == stack_frame_bottom - 1, "");` might work. It could replace the call of `set_interpreter_frame_bottom()`. After all with this pr no platform will ever have to actually set the interpreter frame bottom so it would be good to at least rename the method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1168825041 From cstein at openjdk.org Mon Apr 17 15:03:29 2023 From: cstein at openjdk.org (Christian Stein) Date: Mon, 17 Apr 2023 15:03:29 GMT Subject: RFR: 8304896: Update to use jtreg 7.2 Message-ID: Please review the change to update to using jtreg 7.2. The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. ------------- Commit messages: - JDK-8304896: Updated to use jtreg 7.2 - JDK-8304896: Back to use build number 1 - JDK-8304896: Use CI build number - JDK-8304896: Update to use jtreg 7.2 Changes: https://git.openjdk.org/jdk/pull/13496/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13496&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304896 Stats: 9 lines in 8 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/13496.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13496/head:pull/13496 PR: https://git.openjdk.org/jdk/pull/13496 From darcy at openjdk.org Mon Apr 17 15:27:38 2023 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 17 Apr 2023 15:27:38 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:38:49 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > [skip ci] formatting fixes src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 184: > 182: > 183: /** > 184: * Parses {@code rawAnnotationBytes} into a list of {@link Annotation}s and then Nit: the parameter is named "rawAnnotations" rather than "rawAnnotationBytes". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12810#discussion_r1168897797 From darcy at openjdk.org Mon Apr 17 15:35:38 2023 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 17 Apr 2023 15:35:38 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:38:49 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > [skip ci] formatting fixes src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 234: > 232: * Encodes a list of annotations to a byte array. The byte array can be decoded with {@link #decodeAnnotations(byte[], AnnotationDecoder)}. > 233: */ > 234: public static byte[] encodeAnnotations(Collection annotations) { I don't think it matters much in this use case, but it looks like encodeAnnotations could be changed to take a List rather than a Collection, as the comment implies. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12810#discussion_r1168913137 From darcy at openjdk.org Mon Apr 17 15:51:38 2023 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 17 Apr 2023 15:51:38 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:38:49 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > [skip ci] formatting fixes src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 419: > 417: * @param type of the object representing a decoded error > 418: */ > 419: public interface AnnotationDecoder { I think it would be better to include some bound on the type parameters to better capture their intention A extends java.lang.Annotatoin E extends java.lang.Enum etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12810#discussion_r1168933556 From erikj at openjdk.org Mon Apr 17 16:04:33 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 17 Apr 2023 16:04:33 GMT Subject: RFR: 8304896: Update to use jtreg 7.2 In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 14:56:16 GMT, Christian Stein wrote: > Please review the change to update to using jtreg 7.2. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13496#pullrequestreview-1388478429 From cslucas at openjdk.org Mon Apr 17 16:17:30 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 17 Apr 2023 16:17:30 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v9] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix tests. Remember previous reducible Phis. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/a10b0a4c..aec1b07a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=07-08 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From darcy at openjdk.org Mon Apr 17 16:53:37 2023 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 17 Apr 2023 16:53:37 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:38:49 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > [skip ci] formatting fixes A few higher-level comments: >From the long-term perspective, it is likely that the set of kinds of elements that can occur in an annotation will be expanded, for example, method references are a repeated request. Easing future maintenance to gives more inter-source linkage in this situation and error handling for this case in the libgraal code would be prudent IMO. The java.lang.reflect.AnnotatedElement API (https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/reflect/AnnotatedElement.html) defines different ways an annotation can be affiliated with an element: "The terms directly present, indirectly present, present, and associated are used throughout this interface to describe precisely which annotations are returned by methods: " tl;dr these terms relate to which of inheriting annotations and looking through repeated annotations the methods do on behalf of the caller. I think the methods should phrase their operations in terms of these concepts, such as "Construct the annotations present..." since inheritance is taken into account. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1511735022 From coleenp at openjdk.org Mon Apr 17 18:46:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 17 Apr 2023 18:46:05 GMT Subject: RFR: JDK-8301497: Replace NULL with nullptr in cpu/s390 In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:40:09 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/s390. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12325#pullrequestreview-1388744906 From pchilanomate at openjdk.org Mon Apr 17 19:08:40 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 17 Apr 2023 19:08:40 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 13:45:12 GMT, Fredrik Bredberg wrote: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. Looks good to me. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2152: > 2150: > 2151: assert((stack_frame_bottom >= stack_frame_top + fsize) && > 2152: (stack_frame_bottom <= stack_frame_top + fsize + 1), ""); // internal alignment on aarch64 Since we didn't add any new padding shouldn't this assert now be stack_frame_bottom == stack_frame_top + fsize? ------------- PR Review: https://git.openjdk.org/jdk/pull/13477#pullrequestreview-1388783799 PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1169164544 From duke at openjdk.org Mon Apr 17 19:16:10 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Mon, 17 Apr 2023 19:16:10 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: <5Lh1b1RiwJy5YfgQcQhmNy-ufET1y_gxidS-ldEMFh4=.abbcb2f8-0022-4334-a21f-f70c7f4bad92@github.com> References: <45-ONQI9dSdGx8chKpms07dh_TfXJUWWCAsYiAd32rM=.cd749431-c12e-42ab-a607-e4ca60bce69b@github.com> <8gSnJ7pOgAbvd6pjxrA3DBtnJ4tC-G6LbHGClDmEKhU=.8ebbe4e2-784f-41a5-8d9f-13bc65de0eee@github.com> <5Lh1b1RiwJy5YfgQcQhmNy-ufET1y_gxidS-ldEMFh4=.abbcb2f8-0022-4334-a21f-f70c7f4bad92@github.com> Message-ID: On Mon, 17 Apr 2023 14:41:23 GMT, Richard Reingruber wrote: >> I see what you mean, but I chose to keep it because of the assert() in ThawBase::set_interpreter_frame_bottom. >> After all, it was this assert that alerted me to the JDK-8305247 bug. > > Is it possible to get an equivalent but platform independent version of the assertion? > Something like `assert(f.interpreter_frame_local_at(0) == stack_frame_bottom - 1, "");` might work. > It could replace the call of `set_interpreter_frame_bottom()`. > After all with this pr no platform will ever have to actually set the interpreter frame bottom so it would be good to at least rename the method. Sounds like a plan. I'll look into it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1169174223 From rkennke at openjdk.org Mon Apr 17 20:04:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 17 Apr 2023 20:04:26 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v60] In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 11:19:32 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 156 commits: > > - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 > - A few more LM_ prefixes in 32bit code > - Replace UseHeavyMonitor with LockingMode == LM_MONITOR > - Prefix LockingMode constants with LM_* > - Bunch of comments and typos > - Don't use NativeAccess in LockStack::contains() > - RISCV update > - Put back thread type check in OS::is_lock_owned() > - Named constants for LockingMode > - Address David's review comments > - ... and 146 more: https://git.openjdk.org/jdk/compare/d2ce04bb...d0a448c6 Hi there, what is needed to bring this PR over the approval line? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1512003092 From rkennke at openjdk.org Mon Apr 17 20:10:38 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 17 Apr 2023 20:10:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v61] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Simple build fix for extra arches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/d0a448c6..c3486726 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=60 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=59-60 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From dnsimon at openjdk.org Mon Apr 17 20:27:08 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 17 Apr 2023 20:27:08 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 15:48:53 GMT, Joe Darcy wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> [skip ci] formatting fixes > > src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 419: > >> 417: * @param type of the object representing a decoded error >> 418: */ >> 419: public interface AnnotationDecoder { > > I think it would be better to include some bound on the type parameters to better capture their intention > A extends java.lang.Annotatoin > E extends java.lang.Enum > etc. These types are *alternatives* to `java.lang.Annotation`, `java.lang.Enum` etc. That's the primary motivation for this PR, i.e. to be able to represent annotations without having to reify them as `java.lang.Annotation` objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12810#discussion_r1169232602 From dnsimon at openjdk.org Mon Apr 17 20:36:48 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 17 Apr 2023 20:36:48 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 15:32:56 GMT, Joe Darcy wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> [skip ci] formatting fixes > > src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 234: > >> 232: * Encodes a list of annotations to a byte array. The byte array can be decoded with {@link #decodeAnnotations(byte[], AnnotationDecoder)}. >> 233: */ >> 234: public static byte[] encodeAnnotations(Collection annotations) { > > I don't think it matters much in this use case, but it looks like encodeAnnotations could be changed to take a List rather than a Collection, as the comment implies. Just above (line 228) you can see a call to this method where the argument comes from `Map.values()` whose type is `Collection` so I'd prefer to leave it as is rather than have to convert the argument to a `List`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12810#discussion_r1169240341 From dnsimon at openjdk.org Mon Apr 17 20:36:49 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 17 Apr 2023 20:36:49 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 20:33:26 GMT, Doug Simon wrote: >> src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 234: >> >>> 232: * Encodes a list of annotations to a byte array. The byte array can be decoded with {@link #decodeAnnotations(byte[], AnnotationDecoder)}. >>> 233: */ >>> 234: public static byte[] encodeAnnotations(Collection annotations) { >> >> I don't think it matters much in this use case, but it looks like encodeAnnotations could be changed to take a List rather than a Collection, as the comment implies. > > Just above (line 228) you can see a call to this method where the argument comes from `Map.values()` whose type is `Collection` so I'd prefer to leave it as is rather than have to convert the argument to a `List`. Just above (line 228) you can see a call to this method where the argument comes from `Map.values()` whose type is `Collection` so I'd prefer to leave it as is rather than have to convert the argument to a `List`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12810#discussion_r1169240392 From dnsimon at openjdk.org Mon Apr 17 20:49:43 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 17 Apr 2023 20:49:43 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 16:50:47 GMT, Joe Darcy wrote: > From the long-term perspective, it is likely that the set of kinds of elements that can occur in an annotation will be expanded, for example, method references are a repeated request. Easing future maintenance to gives more inter-source linkage in this situation and error handling for this case in the libgraal code would be prudent IMO. I'm not sure what you're suggesting in terms of how I should update this PR. Do you mean `AnnotationData.get` needs to somehow be more flexible? If so, could you please give a concrete example of what you're after. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1512063544 From dnsimon at openjdk.org Mon Apr 17 21:02:50 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 17 Apr 2023 21:02:50 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v8] In-Reply-To: References: Message-ID: > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - rephrased javadoc Annotated to more precisely describe which annotations are returned - fixed comment - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8303431 - [skip ci] formatting fixes - addressed review feedback - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8303431 - switched to use of lists and maps instead of arrays - fixed whitespace - added support for inherited annotations - Merge branch 'master' into JDK-8303431 - ... and 2 more: https://git.openjdk.org/jdk/compare/525a91e3...362738a6 ------------- Changes: https://git.openjdk.org/jdk/pull/12810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=07 Stats: 2319 lines in 34 files changed: 2268 ins; 23 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From dnsimon at openjdk.org Mon Apr 17 21:02:51 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 17 Apr 2023 21:02:51 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 16:50:47 GMT, Joe Darcy wrote: > the methods should phrase their operations in terms of these concepts... I think this is what you're suggesting: https://github.com/openjdk/jdk/pull/12810/commits/362738a61410cc8d60d8c4c4fc9e3e8ed0393aed ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1512074497 From lmesnik at openjdk.org Mon Apr 17 21:43:43 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 17 Apr 2023 21:43:43 GMT Subject: RFR: 8304896: Update to use jtreg 7.2 In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 14:56:16 GMT, Christian Stein wrote: > Please review the change to update to using jtreg 7.2. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13496#pullrequestreview-1388984542 From iris at openjdk.org Mon Apr 17 21:48:42 2023 From: iris at openjdk.org (Iris Clark) Date: Mon, 17 Apr 2023 21:48:42 GMT Subject: RFR: 8304896: Update to use jtreg 7.2 In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 14:56:16 GMT, Christian Stein wrote: > Please review the change to update to using jtreg 7.2. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. Marked as reviewed by iris (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13496#pullrequestreview-1388989497 From cslucas at openjdk.org Mon Apr 17 22:11:46 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 17 Apr 2023 22:11:46 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v8] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Sat, 15 Apr 2023 00:17:55 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR review 3. Some comments and be able to abort compilation. > > New test failed in tier1 on all platforms. Here is list: > > 1) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.TestTrapAfterMerge(boolean,int,int)" - [Failed IR rules: 1]: > 2) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testCondLoadAfterMerge(boolean,boolean,int,int)" - [Failed IR rules: 1]: > 3) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testLoadInCondAfterMerge(boolean,int,int)" - [Failed IR rules: 1]: > 4) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testMergesAndMixedEscape(boolean,int,int)" - [Failed IR rules: 1]: > 5) Method "compiler.c2.irTests.scalarReplacement.AllocationMergesTests$Point[] compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testNestedObjectsArray(boolean,int,int)" - [Failed IR rules: 1]: > 6) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testNestedObjectsNoEscapeObject(boolean,int,int)" - [Failed IR rules: 1]: > 7) Method "compiler.c2.irTests.scalarReplacement.AllocationMergesTests$Point compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testNestedObjectsObject(boolean,int,int)" - [Failed IR rules: 1]: @vnkozlov - sorry about that. I fixed the code now and all GHA tests are passing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1512148978 From duke at openjdk.org Tue Apr 18 00:38:57 2023 From: duke at openjdk.org (duke) Date: Tue, 18 Apr 2023 00:38:57 GMT Subject: Withdrawn: 8290903: Enable function warning attribute for Clang build once Clang supports merging In-Reply-To: References: Message-ID: On Sat, 18 Feb 2023 15:26:04 GMT, Afshin Zafari wrote: > The warning attribute is enabled.. > > ### Test > mach5 tier1-5. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12634 From duke at openjdk.org Tue Apr 18 01:05:55 2023 From: duke at openjdk.org (duke) Date: Tue, 18 Apr 2023 01:05:55 GMT Subject: Withdrawn: 8295382: Implement SHA-256 Intrinsic on RISC-V In-Reply-To: <1JWd-CDS_jpIDtfu7HJAVmvViShKzVTrCOxDVBZ9GSo=.904a8e56-794c-43d6-8448-de2a1a856f33@github.com> References: <1JWd-CDS_jpIDtfu7HJAVmvViShKzVTrCOxDVBZ9GSo=.904a8e56-794c-43d6-8448-de2a1a856f33@github.com> Message-ID: On Thu, 26 Jan 2023 01:47:10 GMT, Ludovic Henry wrote: > This has been tested with patches currently being submitted to QEMU to add support for Zvkb and Zvknha extensions. > > The documentation for the Vector Crypto extension's instructions is available at https://github.com/riscv/riscv-crypto/tree/master/doc/vector/insns This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12208 From darcy at openjdk.org Tue Apr 18 01:09:56 2023 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 18 Apr 2023 01:09:56 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 20:33:30 GMT, Doug Simon wrote: >> src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 234: >> >>> 232: * Encodes a list of annotations to a byte array. The byte array can be decoded with {@link #decodeAnnotations(byte[], AnnotationDecoder)}. >>> 233: */ >>> 234: public static byte[] encodeAnnotations(Collection annotations) { >> >> I don't think it matters much in this use case, but it looks like encodeAnnotations could be changed to take a List rather than a Collection, as the comment implies. > > Just above (line 228) you can see a call to this method where the argument comes from `Map.values()` whose type is `Collection` so I'd prefer to leave it as is rather than have to convert the argument to a `List`. In that case, I think the comment on line 232 should be updated to not say "list". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12810#discussion_r1169391239 From duke at openjdk.org Tue Apr 18 01:15:59 2023 From: duke at openjdk.org (duke) Date: Tue, 18 Apr 2023 01:15:59 GMT Subject: Withdrawn: JDK-8300080: offset_of for GCC/Clang exhibits undefined behavior and is not always a compile-time constant In-Reply-To: References: Message-ID: On Thu, 12 Jan 2023 20:28:50 GMT, Justin King wrote: > The implementation of `offset_of` for GCC/Clang only deals with types are aligned to 16 bytes or less, if they are more, such as `zCollectedHeap` the behavior is undefined. UBSan also suggests that `offset_of` is not always a compile time constant, as the stack trace came from the dynamic loader during library loading. This patch changes `offset_of` to use `offsetof` and disables the warning `invalid-offsetof` for the JVM. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/11978 From darcy at openjdk.org Tue Apr 18 02:24:53 2023 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 18 Apr 2023 02:24:53 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 20:46:36 GMT, Doug Simon wrote: > > From the long-term perspective, it is likely that the set of kinds of elements that can occur in an annotation will be expanded, for example, method references are a repeated request. Easing future maintenance to gives more inter-source linkage in this situation and error handling for this case in the libgraal code would be prudent IMO. > > I'm not sure what you're suggesting in terms of how I should update this PR. Do you mean `AnnotationData.get` needs to somehow be more flexible? If so, could you please give a concrete example of what you're after. Let me explain my concerns in more detail. We have (at least) two separate annotations implementations in the JDK, one in javac for compile-time and other in core reflection for runtime. Due to various technical constraints, the implementations are necessarily separate (although the annotation objects constructed by javac do also implement the java.lang.annotation.Annotation interface also used by core reflection). This work is proposing to add another partial implementation to satisfy other technical goals, reusing portions of the existing core reflection machinery. If at some point the universe of objects that can be encoded as annotation is expanded, e..g method literals as mentioned previously, it is well-understood that core reflection and javac will need to be updated. I think it would be relatively easy to overlook the need to make corresponding updates to libgraal API and all regression tests might still pass. As a concrete suggestion, if such an unknown annotation was handed over to libgraal, I think the code should reject it (AssertionError("unexpected annotation component") etc.) in hope of the omission getting corrected sooner. Also, leaving some bread crumb comments to future maintainers of core reflection annotations in that implementation package would be helpful too, e.g. "// used by libgraal". HTH ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1512343062 From fyang at openjdk.org Tue Apr 18 03:33:35 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Apr 2023 03:33:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v61] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 20:10:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Simple build fix for extra arches Hello, please add a few more changes for riscv making use of the 'test_bit' introduced recently. This function will use 'bexti' instruction from the Zbs extension to do single-bit testing when available. Tier1-3 tested on linux-riscv64 unmatched boards with LockingMode set to LM_LIGHTWEIGHT. [riscv-test_bit.txt](https://github.com/openjdk/jdk/files/11257602/riscv-test_bit.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1512382864 From iklam at openjdk.org Tue Apr 18 05:13:10 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Apr 2023 05:13:10 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v6] In-Reply-To: References: Message-ID: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - fixed merge - Merge branch 'master' into 8298048-combine-cds-heap-to-single-region-PUSH - removed G1CollectedHeap::fill_archive_regions() -- we no longer have unused space at the start of the "old" regions - Simplified of runtime range of the mapped archive heap - @ashu-mehra comments; some clean up - @ashu-mehra comments - Merge branch 'master' into 8298048-combine-cds-heap-to-single-region-PUSH - more clean up: heap_regions -> heap_region, etc - @matias9927 comments - Remove archive region types from G1 - ... and 2 more: https://git.openjdk.org/jdk/compare/e3ece365...3543dd2a ------------- Changes: https://git.openjdk.org/jdk/pull/13284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=05 Stats: 3152 lines in 77 files changed: 121 ins; 2371 del; 660 mod Patch: https://git.openjdk.org/jdk/pull/13284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13284/head:pull/13284 PR: https://git.openjdk.org/jdk/pull/13284 From dnsimon at openjdk.org Tue Apr 18 07:03:45 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 18 Apr 2023 07:03:45 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 01:06:31 GMT, Joe Darcy wrote: >> Just above (line 228) you can see a call to this method where the argument comes from `Map.values()` whose type is `Collection` so I'd prefer to leave it as is rather than have to convert the argument to a `List`. > > In that case, I think the comment on line 232 should be updated to not say "list". I changed the comment to "Encodes annotations to a byte array." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12810#discussion_r1169582963 From dnsimon at openjdk.org Tue Apr 18 07:27:47 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 18 Apr 2023 07:27:47 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v9] In-Reply-To: References: Message-ID: > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - added breadcrumb in AnnotationParser about considering JVMCI should new annotation element types be added - fixed javadoc comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12810/files - new: https://git.openjdk.org/jdk/pull/12810/files/362738a6..bad23a0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=07-08 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From dnsimon at openjdk.org Tue Apr 18 07:27:49 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 18 Apr 2023 07:27:49 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 02:22:11 GMT, Joe Darcy wrote: > I think the code should reject it The `AnnotationData` constructor already has a check for unknown annotation element types so I think this concern is covered. > leaving some bread crumb comments to future maintainers of core reflection annotations Done here: https://github.com/openjdk/jdk/pull/12810/commits/bad23a0c90743f7568c3b8ed5b57350e62503dcf ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1512577686 From jsjolen at openjdk.org Tue Apr 18 09:01:57 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Apr 2023 09:01:57 GMT Subject: RFR: JDK-8301497: Replace NULL with nullptr in cpu/s390 In-Reply-To: References: Message-ID: <44xlRkI9MJEEzXGnJI4WSNQX5kr5TjLvgz8Zcd1mpe0=.25e9272d-d55a-42e5-9c68-91e2581315cb@github.com> On Tue, 31 Jan 2023 11:40:09 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/s390. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12325#issuecomment-1512717105 From jsjolen at openjdk.org Tue Apr 18 09:01:58 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Apr 2023 09:01:58 GMT Subject: Integrated: JDK-8301497: Replace NULL with nullptr in cpu/s390 In-Reply-To: References: Message-ID: <9JPQZ6OyUuHN0cLIzGzEJTtBN8sXFaCBce9iQlbk4-Q=.7b950b60-eb72-4e43-8d82-9dbe5e01aa67@github.com> On Tue, 31 Jan 2023 11:40:09 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/s390. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 54f7b6ca Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/54f7b6ca34986cc26c5b91c6724b9a1754c94391 Stats: 452 lines in 44 files changed: 0 ins; 0 del; 452 mod 8301497: Replace NULL with nullptr in cpu/s390 Reviewed-by: amitkumar, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/12325 From amitkumar at openjdk.org Tue Apr 18 09:37:57 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 18 Apr 2023 09:37:57 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v3] In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' into simplify_assert - added id to illtrap - address lutz comment & revert inlining - Revert "inline asm_assert" - Revert "inlined assert method" - Revert "added inline keyword before function implementation" - added inline keyword before function implementation - inlined assert method - relocation code shortening - inline asm_assert - ... and 4 more: https://git.openjdk.org/jdk/compare/896207de...cbfe5172 ------------- Changes: https://git.openjdk.org/jdk/pull/12822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12822&range=02 Stats: 87 lines in 9 files changed: 7 ins; 39 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/12822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12822/head:pull/12822 PR: https://git.openjdk.org/jdk/pull/12822 From jsjolen at openjdk.org Tue Apr 18 10:05:45 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Apr 2023 10:05:45 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ Message-ID: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. Here are some typical things to look out for: 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. An example of this: ```c++ // This function returns null void* ret_null(); // This function returns true if *x == nullptr bool is_nullptr(void** x); Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. Thanks! ------------- Commit messages: - Fixes - Merge remote-tracking branch 'origin/master' into JDK-8301223 - Replace NULL with nullptr in share/gc/g1/ Changes: https://git.openjdk.org/jdk/pull/12248/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12248&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301223 Stats: 849 lines in 83 files changed: 0 ins; 0 del; 849 mod Patch: https://git.openjdk.org/jdk/pull/12248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12248/head:pull/12248 PR: https://git.openjdk.org/jdk/pull/12248 From tschatzl at openjdk.org Tue Apr 18 10:06:04 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Apr 2023 10:06:04 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ In-Reply-To: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Message-ID: <95mU0bUrdokPM5fGAqrusKCv9ZbFJmAKg_hS3lYj8Ik=.7110ed19-1978-44a4-9433-e584228b9a17@github.com> On Fri, 27 Jan 2023 10:06:10 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 46: > 44: const Type **fields = TypeTuple::fields(2); > 45: fields[TypeFunc::Parms+0] = TypeInstPtr::NOTnullptr; // original field value > 46: fields[TypeFunc::Parms+1] = TypeRawPtr::NOTnullptr; // thread Suggestion: fields[TypeFunc::Parms+0] = TypeInstPtr::NOTNULL; // original field value fields[TypeFunc::Parms+1] = TypeRawPtr::NOTNULL; // thread src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 59: > 57: const Type **fields = TypeTuple::fields(2); > 58: fields[TypeFunc::Parms+0] = TypeRawPtr::NOTnullptr; // Card addr > 59: fields[TypeFunc::Parms+1] = TypeRawPtr::NOTnullptr; // thread Suggestion: fields[TypeFunc::Parms+0] = TypeRawPtr::NOTNULL; // Card addr fields[TypeFunc::Parms+1] = TypeRawPtr::NOTNULL; // thread src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 77: > 75: * > 76: * If the previous value is nullptr there is no need to save the old value. > 77: * References that are nullptr are filtered during runtime by the barrier Suggestion: * If the previous value is null there is no need to save the old value. * References that are null are filtered during runtime by the barrier src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 81: > 79: * > 80: * However in the case of newly allocated objects it might be possible to > 81: * prove that the reference about to be overwritten is nullptr during compile Suggestion: * prove that the reference about to be overwritten is null during compile src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 251: > 249: } > 250: > 251: // if (pre_val != null) Suggestion: // if (pre_val != nullptr) This is converted wrongly in a comment src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 253: > 251: // if (pre_val != null) > 252: __ if_then(pre_val, BoolTest::ne, kit->null()); { > 253: Node* buffer = __ load(__ ctrl(), buffer_adr, TypeRawPtr::NOTnullptr, T_ADDRESS, Compile::AliasIdxRaw); Suggestion: Node* buffer = __ load(__ ctrl(), buffer_adr, TypeRawPtr::NOTNULL, T_ADDRESS, Compile::AliasIdxRaw); src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 273: > 271: __ make_leaf_call(tf, CAST_FROM_FN_PTR(address, G1BarrierSetRuntime::write_ref_field_pre_entry), "write_ref_field_pre_entry", pre_val, tls); > 272: } __ end_if(); // (!index) > 273: } __ end_if(); // (pre_val != null) Suggestion: } __ end_if(); // (pre_val != nullptr) src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 291: > 289: * To reduce the number of updates to the remembered set the post-barrier > 290: * filters updates to fields in objects located in the Young Generation, > 291: * the same region as the reference, when the nullptr is being written or Suggestion: * the same region as the reference, when null is being written or src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 382: > 380: // If we are writing a null then we need no post barrier > 381: > 382: if (val != nullptr && val->is_Con() && val->bottom_type() == TypePtr::nullptr_PTR) { Suggestion: if (val != nullptr && val->is_Con() && val->bottom_type() == TypePtr::NULL_PTR) { src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 385: > 383: // Must be null > 384: const Type* t = val->bottom_type(); > 385: assert(t == Type::TOP || t == TypePtr::nullptr_PTR, "must be null"); Suggestion: assert(t == Type::TOP || t == TypePtr::NULL_PTR, "must be null"); src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 386: > 384: const Type* t = val->bottom_type(); > 385: assert(t == Type::TOP || t == TypePtr::nullptr_PTR, "must be null"); > 386: // No post barrier if writing nullx Suggestion: // No post barrier if writing null. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 437: > 435: // potentially reset these fields in the JavaThread. > 436: Node* index = __ load(__ ctrl(), index_adr, TypeX_X, TypeX_X->basic_type(), Compile::AliasIdxRaw); > 437: Node* buffer = __ load(__ ctrl(), buffer_adr, TypeRawPtr::NOTnullptr, T_ADDRESS, Compile::AliasIdxRaw); Suggestion: Node* buffer = __ load(__ ctrl(), buffer_adr, TypeRawPtr::NOTNULL, T_ADDRESS, Compile::AliasIdxRaw); src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 462: > 460: __ if_then(xor_res, BoolTest::ne, zeroX, likely); { > 461: > 462: // No barrier if we are storing a null Suggestion: // No barrier if we are storing a null. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 342: > 340: do { \ > 341: assert_at_safepoint(); \ > 342: assert(Thread::current_or_null() != nullptr, "no current thread"); \ Suggestion: assert(Thread::current_or_null() != nullptr, "no current thread"); \ src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 509: > 507: // to support an allocation of the given "word_size". If > 508: // successful, perform the allocation and return the address of the > 509: // allocated block, or else "null". Suggestion: // allocated block, or else null. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1921: > 1919: assert(limit == bottom, > 1920: "the region limit should be at bottom"); > 1921: // we return null and the caller should try calling Suggestion: // We return null and the caller should try calling src/hotspot/share/gc/g1/g1FreeIdSet.cpp line 36: > 34: G1FreeIdSet::G1FreeIdSet(uint start, uint size) : > 35: _sem(size), // counting semaphore for available ids > 36: _next(nullptr), // array of "next" indices Suggestion: _next(nullptr), // array of "next" indices src/hotspot/share/gc/g1/g1RemSet.cpp line 1502: > 1500: > 1501: // If the card is no longer dirty, nothing to do. > 1502: // We cannot load the card value before the "r == null" check, because G1 Suggestion: // We cannot load the card value before the "r == nullptr" check above, because G1 ------------- PR Review: https://git.openjdk.org/jdk/pull/12248#pullrequestreview-1274727932 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090343260 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090343530 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090341235 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090341394 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090342158 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090342475 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090342663 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090344299 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090344705 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090345163 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090345368 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090345759 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090346018 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090348911 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090349649 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090351098 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090353074 PR Review Comment: https://git.openjdk.org/jdk/pull/12248#discussion_r1090356716 From jsjolen at openjdk.org Tue Apr 18 10:06:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Apr 2023 10:06:05 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ In-Reply-To: <95mU0bUrdokPM5fGAqrusKCv9ZbFJmAKg_hS3lYj8Ik=.7110ed19-1978-44a4-9433-e584228b9a17@github.com> References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> <95mU0bUrdokPM5fGAqrusKCv9ZbFJmAKg_hS3lYj8Ik=.7110ed19-1978-44a4-9433-e584228b9a17@github.com> Message-ID: On Mon, 30 Jan 2023 09:23:11 GMT, Thomas Schatzl wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Changes requested by tschatzl (Reviewer). Hi @tschatzl , thank you for reviewing this. I've added your fixes and some of my own. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12248#issuecomment-1512794124 From rehn at openjdk.org Tue Apr 18 10:42:46 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 18 Apr 2023 10:42:46 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 13:45:12 GMT, Fredrik Bredberg wrote: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. Thanks for this @fbredber ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13477#issuecomment-1512850919 From mdoerr at openjdk.org Tue Apr 18 10:44:03 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 18 Apr 2023 10:44:03 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Adaptation for JDK-8305668 - Merge remote-tracking branch 'origin' into PPC64_Panama - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. - Adaptation for JDK-8303022. - Adaptation for JDK-8303684. - Merge branch 'openjdk:master' into PPC64_Panama - Merge branch 'master' into PPC64_Panama - Fix Copyright format. - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. - Allow TestHFA to run on musl. Add Upcalls. - ... and 14 more: https://git.openjdk.org/jdk/compare/3bba8995...725732a0 ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=21 Stats: 2440 lines in 62 files changed: 2330 ins; 1 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From jsjolen at openjdk.org Tue Apr 18 11:41:53 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Apr 2023 11:41:53 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ [v2] In-Reply-To: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Missed fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12248/files - new: https://git.openjdk.org/jdk/pull/12248/files/7cb50605..9bb7edbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12248&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12248&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12248/head:pull/12248 PR: https://git.openjdk.org/jdk/pull/12248 From aph at openjdk.org Tue Apr 18 12:03:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 18 Apr 2023 12:03:45 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 13:45:12 GMT, Fredrik Bredberg wrote: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 149: > 147: // on AARCH64, we may insert padding between the locals and the rest of the frame > 148: // (see TemplateInterpreterGenerator::generate_normal_entry, and AbstractInterpreter::layout_activation) > 149: // since we freeze the padding word (see recurse_freeze_interpreted_frame) in order to keep the same relativized Suggestion: // because we freeze the padding word (see recurse_freeze_interpreted_frame) in order to keep the same relativized ... for clarity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1169926915 From aturbanov at openjdk.org Tue Apr 18 12:11:43 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 18 Apr 2023 12:11:43 GMT Subject: RFR: 8304896: Update to use jtreg 7.2 In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 14:56:16 GMT, Christian Stein wrote: > Please review the change to update to using jtreg 7.2. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. Interesting, why this JBS ticked is considered as a bug? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13496#issuecomment-1512977295 From aph at openjdk.org Tue Apr 18 12:51:46 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 18 Apr 2023 12:51:46 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: <4GITjfS1eT5rMwMKT6-ZdSEN6KXZQms8CaKSfK_rAoE=.211c7cd5-bb14-4bfa-b5a0-23ead8251206@github.com> References: <4GITjfS1eT5rMwMKT6-ZdSEN6KXZQms8CaKSfK_rAoE=.211c7cd5-bb14-4bfa-b5a0-23ead8251206@github.com> Message-ID: On Mon, 17 Apr 2023 14:44:28 GMT, Richard Reingruber wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2157: > >> 2155: >> 2156: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. >> 2157: // On those architectures we thaw the padding in order to keep the same localized pointer values. > > Somehow this suggestion was not delivered... > Suggestion: > > // On those architectures we thaw the padding in order to keep the same relative references. This is very unclear. I think it means the same "relative pointers," otherwise known as "offsets." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1169987372 From duke at openjdk.org Tue Apr 18 12:59:43 2023 From: duke at openjdk.org (Afshin Zafari) Date: Tue, 18 Apr 2023 12:59:43 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new Message-ID: - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. - The `-fcheck-new` is removed from the gcc compile flags. - The `operator new` and `operator delete` are deleted from `StackObj`. - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. - The `Thread::operator new`with and without `null` return are removed. ### Tests local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 mach5: tiers 1-5 ------------- Commit messages: - 8305590: Remove nothrow exception specifications from operator new - 8305590: Remove nothrow exception specifications from operator new Changes: https://git.openjdk.org/jdk/pull/13498/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13498&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305590 Stats: 34 lines in 7 files changed: 3 ins; 8 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/13498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13498/head:pull/13498 PR: https://git.openjdk.org/jdk/pull/13498 From mgronlun at openjdk.org Tue Apr 18 14:30:41 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 18 Apr 2023 14:30:41 GMT Subject: RFR: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 Message-ID: Greetings, With [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967), much refactoring was done to the JVMTI code concerning agents. However, some platforms do not have JVMTI support, and tier5 of testing builds an embedded build, linux-arm32-open-cmp-baseline, which failed because the refactoring did not properly handle conditional compilations for JVMTI. JDK-8257967 did run tier5, but it used an existing build, so it did not cause recompilations of the embedded target :-( This changeset adds the conditional constructs to let linux-arm32-open-cmp-baseline build successfully. It does not look good, but there you go... Testing: Building: linux-arm32-open-cmp-baseline Building: regular platforms Thanks Markus ------------- Commit messages: - conditional constructs for embedded Changes: https://git.openjdk.org/jdk/pull/13512/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13512&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306282 Stats: 29 lines in 3 files changed: 2 ins; 7 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/13512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13512/head:pull/13512 PR: https://git.openjdk.org/jdk/pull/13512 From egahlin at openjdk.org Tue Apr 18 14:40:41 2023 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 18 Apr 2023 14:40:41 GMT Subject: RFR: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 14:22:21 GMT, Markus Gr?nlund wrote: > Greetings, > > With [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967), much refactoring was done to the JVMTI code concerning agents. However, some platforms do not have JVMTI support, and tier5 of testing builds an embedded build, linux-arm32-open-cmp-baseline, which failed because the refactoring did not properly handle conditional compilations for JVMTI. > > JDK-8257967 did run tier5, but it used an existing build, so it did not cause recompilations of the embedded target :-( > > This changeset adds the conditional constructs to let linux-arm32-open-cmp-baseline build successfully. > > It does not look good, but there you go... > > Testing: > > Building: linux-arm32-open-cmp-baseline > Building: regular platforms > > Thanks > Markus Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13512#pullrequestreview-1390279307 From jsjolen at openjdk.org Tue Apr 18 14:51:46 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Apr 2023 14:51:46 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ [v2] In-Reply-To: References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Message-ID: <9LxTmPMhjcrQHqTyg-RDtbHd_bQZcoHpxF7ji9qygak=.d96ae6cf-146a-4d9e-9b44-cf8c7f38348f@github.com> On Tue, 18 Apr 2023 11:41:53 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Missed fix Hm, got some funny errors: === Output from failing command(s) repeated here === * For target hotspot_variant-server_libjvm_objs_filemap.o: /home/runner/work/jdk/jdk/src/hotspot/share/cds/filemap.cpp: In member function 'void FileMapInfo::seek_to_position(size_t)': /home/runner/work/jdk/jdk/src/hotspot/share/cds/filemap.cpp:1429:50: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1429 | log_error(cds)("Unable to seek to position %ld", pos); | ~~^ ~~~ | | | | | size_t {aka unsigned int} | long int | %d cc1plus: all warnings being treated as errors I'll have to look into the merge and see if I messed that up. Worst case, I'll re-apply the nullptr conversion to origin/master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12248#issuecomment-1513291230 From mgronlun at openjdk.org Tue Apr 18 15:02:15 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 18 Apr 2023 15:02:15 GMT Subject: RFR: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 [v2] In-Reply-To: References: Message-ID: > Greetings, > > With [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967), much refactoring was done to the JVMTI code concerning agents. However, some platforms do not have JVMTI support, and tier5 of testing builds an embedded build, linux-arm32-open-cmp-baseline, which failed because the refactoring did not properly handle conditional compilations for JVMTI. > > JDK-8257967 did run tier5, but it used an existing build, so it did not cause recompilations of the embedded target :-( > > This changeset adds the conditional constructs to let linux-arm32-open-cmp-baseline build successfully. > > It does not look good, but there you go... > > Testing: > > Building: linux-arm32-open-cmp-baseline > Building: regular platforms > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: fix CDS format specifier to get GitHub actions working again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13512/files - new: https://git.openjdk.org/jdk/pull/13512/files/70bc2625..80e3d33d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13512&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13512&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13512/head:pull/13512 PR: https://git.openjdk.org/jdk/pull/13512 From jjg at openjdk.org Tue Apr 18 15:18:39 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Tue, 18 Apr 2023 15:18:39 GMT Subject: RFR: 8304896: Update to use jtreg 7.2 In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 12:08:50 GMT, Andrey Turbanov wrote: > Interesting, why this JBS ticked is considered as a bug? There's no obvious best choice here (bug, enhancement, task) and as is, it was the same as for similar previous items. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13496#issuecomment-1513340471 From mgronlun at openjdk.org Tue Apr 18 15:22:02 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 18 Apr 2023 15:22:02 GMT Subject: RFR: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 [v3] In-Reply-To: References: Message-ID: <7Mtu4sDUSEqgO7yYhYPVT-aU0XmHOObSRnPOQqDcIic=.443775c9-0d30-4c6f-9815-0133f57a2e00@github.com> > Greetings, > > With [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967), much refactoring was done to the JVMTI code concerning agents. However, some platforms do not have JVMTI support, and tier5 of testing builds an embedded build, linux-arm32-open-cmp-baseline, which failed because the refactoring did not properly handle conditional compilations for JVMTI. > > JDK-8257967 did run tier5, but it used an existing build, so it did not cause recompilations of the embedded target :-( > > This changeset adds the conditional constructs to let linux-arm32-open-cmp-baseline build successfully. > > It does not look good, but there you go... > > Testing: > > Building: linux-arm32-open-cmp-baseline > Building: regular platforms > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: SIZE_FORMAT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13512/files - new: https://git.openjdk.org/jdk/pull/13512/files/80e3d33d..9b231f4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13512&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13512&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13512/head:pull/13512 PR: https://git.openjdk.org/jdk/pull/13512 From coleenp at openjdk.org Tue Apr 18 15:22:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Apr 2023 15:22:05 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 17:09:44 GMT, Afshin Zafari wrote: > - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. > > - The `-fcheck-new` is removed from the gcc compile flags. > > - The `operator new` and `operator delete` are deleted from `StackObj`. > > - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. > - The `Thread::operator new`with and without `null` return are removed. > > ### Tests > local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 > mach5: tiers 1-5 Changes requested by coleenp (Reviewer). src/hotspot/share/memory/allocation.hpp line 289: > 287: void* operator new [](size_t size) throw() = delete; > 288: void operator delete(void* p) = delete; > 289: void operator delete [](void* p) = delete; Nice. src/hotspot/share/runtime/thread.hpp line 203: > 201: static bool is_JavaThread_protected_by_TLH(const JavaThread* target); > 202: > 203: void operator delete(void* p); Should you have removed delete and Thread::allocate() also? is Thread::allocate now unused? ------------- PR Review: https://git.openjdk.org/jdk/pull/13498#pullrequestreview-1390368704 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1170196398 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1170197505 From iklam at openjdk.org Tue Apr 18 15:29:08 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Apr 2023 15:29:08 GMT Subject: RFR: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 [v3] In-Reply-To: <7Mtu4sDUSEqgO7yYhYPVT-aU0XmHOObSRnPOQqDcIic=.443775c9-0d30-4c6f-9815-0133f57a2e00@github.com> References: <7Mtu4sDUSEqgO7yYhYPVT-aU0XmHOObSRnPOQqDcIic=.443775c9-0d30-4c6f-9815-0133f57a2e00@github.com> Message-ID: On Tue, 18 Apr 2023 15:22:02 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> With [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967), much refactoring was done to the JVMTI code concerning agents. However, some platforms do not have JVMTI support, and tier5 of testing builds an embedded build, linux-arm32-open-cmp-baseline, which failed because the refactoring did not properly handle conditional compilations for JVMTI. >> >> JDK-8257967 did run tier5, but it used an existing build, so it did not cause recompilations of the embedded target :-( >> >> This changeset adds the conditional constructs to let linux-arm32-open-cmp-baseline build successfully. >> >> It does not look good, but there you go... >> >> Testing: >> >> Building: linux-arm32-open-cmp-baseline >> Building: regular platforms >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > SIZE_FORMAT LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13512#pullrequestreview-1390388100 From coleenp at openjdk.org Tue Apr 18 15:31:01 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Apr 2023 15:31:01 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock Message-ID: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> This patch releases the InvokeMethodTable_lock while creating a method handle intrinsic. If there's a race, it frees a Method created by racing thread. The logic is simple but uses the deallocate_list infrastructure that's mostly used for redefinition making it less rare. With Dacapo2009, this adds about 20 Methods + constant pools to the list. Also the method has to call nmethod->flush which is assumed to be something only GC calls. Tested with tier1-4. ------------- Commit messages: - 8305252: make_method_handle_intrinsic may call java code under a lock Changes: https://git.openjdk.org/jdk/pull/13308/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13308&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305252 Stats: 43 lines in 3 files changed: 15 ins; 6 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/13308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13308/head:pull/13308 PR: https://git.openjdk.org/jdk/pull/13308 From mgronlun at openjdk.org Tue Apr 18 16:03:57 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 18 Apr 2023 16:03:57 GMT Subject: Integrated: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 14:22:21 GMT, Markus Gr?nlund wrote: > Greetings, > > With [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967), much refactoring was done to the JVMTI code concerning agents. However, some platforms do not have JVMTI support, and tier5 of testing builds an embedded build, linux-arm32-open-cmp-baseline, which failed because the refactoring did not properly handle conditional compilations for JVMTI. > > JDK-8257967 did run tier5, but it used an existing build, so it did not cause recompilations of the embedded target :-( > > This changeset adds the conditional constructs to let linux-arm32-open-cmp-baseline build successfully. > > It does not look good, but there you go... > > Testing: > > Building: linux-arm32-open-cmp-baseline > Building: regular platforms > > Thanks > Markus This pull request has now been integrated. Changeset: 0f3828dd Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/0f3828dddd8d4a08677efcd15aa8dfde18540130 Stats: 29 lines in 3 files changed: 2 ins; 7 del; 20 mod 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 Reviewed-by: egahlin, iklam ------------- PR: https://git.openjdk.org/jdk/pull/13512 From aturbanov at openjdk.org Tue Apr 18 16:42:13 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 18 Apr 2023 16:42:13 GMT Subject: RFR: 8304896: Update to use jtreg 7.2 In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 14:56:16 GMT, Christian Stein wrote: > Please review the change to update to using jtreg 7.2. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. 7.1 was an enhancement https://bugs.openjdk.org/browse/JDK-8296710 7.1.1 update was a bug because of https://bugs.openjdk.org/browse/CODETOOLS-7903390 I think enhancement suits better here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13496#issuecomment-1513478868 From mgronlun at openjdk.org Tue Apr 18 17:07:12 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 18 Apr 2023 17:07:12 GMT Subject: RFR: 8306278: jvmtiAgentList.cpp:253 assert(offset >= 0) failed: invariant occurs on AIX after JDK-8257967 Message-ID: Greetings, For most platforms, os::dll_address_to_library_name() only sets offset = -1 in case of errors. If there is an error, the function returns false. This is fine. On AIX, the offset, being optional, is invariantly set to -1, even in the case of non-errors. Easiest to remove the assertion for a positive offset. Thanks Markus ------------- Commit messages: - remove assertion on positive offset Changes: https://git.openjdk.org/jdk/pull/13513/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13513&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306278 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13513.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13513/head:pull/13513 PR: https://git.openjdk.org/jdk/pull/13513 From coleenp at openjdk.org Tue Apr 18 17:13:43 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Apr 2023 17:13:43 GMT Subject: RFR: 8306310: Move is_shared Klass flag Message-ID: Please review this simple patch to move the is_shared_class flag to Klass, CDS flags. The eventual goal is to have AccessFlags only be ones defined in the classfile. Tested with tier1-4. ------------- Commit messages: - 8306310: Move is_shared Klass flag Changes: https://git.openjdk.org/jdk/pull/13514/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13514&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306310 Stats: 15 lines in 2 files changed: 10 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13514.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13514/head:pull/13514 PR: https://git.openjdk.org/jdk/pull/13514 From sspitsyn at openjdk.org Tue Apr 18 17:18:25 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 18 Apr 2023 17:18:25 GMT Subject: RFR: 8306278: jvmtiAgentList.cpp:253 assert(offset >= 0) failed: invariant occurs on AIX after JDK-8257967 In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 16:59:29 GMT, Markus Gr?nlund wrote: > Greetings, > > For most platforms, os::dll_address_to_library_name() only sets offset = -1 in case of errors. If there is an error, the function returns false. This is fine. > > On AIX, the offset, being optional, is invariantly set to -1, even in the case of non-errors. > > Easiest to remove the assertion for a positive offset. > > Thanks > Markus Looks good to me. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13513#pullrequestreview-1390594264 From coleenp at openjdk.org Tue Apr 18 17:19:18 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Apr 2023 17:19:18 GMT Subject: RFR: 8306123: Move InstanceKlass writeable flags Message-ID: Please review this patch to move the writeable Klass AccessFlags to InstanceKlassFlags. Tested with tier1-4. ------------- Commit messages: - 8306123: Move InstanceKlass writeable flags Changes: https://git.openjdk.org/jdk/pull/13515/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13515&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306123 Stats: 114 lines in 6 files changed: 64 ins; 34 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/13515.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13515/head:pull/13515 PR: https://git.openjdk.org/jdk/pull/13515 From shade at openjdk.org Tue Apr 18 17:59:07 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Apr 2023 17:59:07 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v2] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into JDK-83050920-thread-sleep-subms - Work - Draft work - Merge branch 'master' into JDK-83050920-thread-sleep-subms - Merge branch 'master' into JDK-83050920-thread-sleep-subms - More fixes - Fix Windows yet again - Helper method should be inline, not static - Work ------------- Changes: https://git.openjdk.org/jdk/pull/13225/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=01 Stats: 277 lines in 16 files changed: 216 ins; 31 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From shade at openjdk.org Tue Apr 18 17:59:10 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Apr 2023 17:59:10 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v2] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 12 Apr 2023 07:44:38 GMT, Alan Bateman wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Work >> - Draft work >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - More fixes >> - Fix Windows yet again >> - Helper method should be inline, not static >> - Work > > src/hotspot/share/include/jvm.h line 279: > >> 277: >> 278: JNIEXPORT void JNICALL >> 279: JVM_Sleep(JNIEnv *env, jclass threadClass, jlong millis, jint nanos); > > I wonder if it would be simpler to just provide a single value, in nanoseconds, to the VM. That's enough for a sleep of 292 years. Windows would still need to convert to milliseconds of course but it overall would avoid sending two values down to the park code. Right, that might be better. New revision should address that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170386627 From shade at openjdk.org Tue Apr 18 17:59:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Apr 2023 17:59:12 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v2] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: <78WWvars1HHkiXvVZYVWNzUzIZCV4gEDU_mZd2BOEYI=.419a34ec-d53f-4075-ae4f-81524ad4c536@github.com> On Thu, 30 Mar 2023 01:26:36 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Work >> - Draft work >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - More fixes >> - Fix Windows yet again >> - Helper method should be inline, not static >> - Work > > src/hotspot/share/runtime/javaThread.cpp line 1981: > >> 1979: } >> 1980: >> 1981: bool JavaThread::sleep(jlong millis, jint nanos) { > > You don't need the overloads at this level - the incoming call should always have millis and nanos, even if nanos is zero. I think the distinction between millis and nanos makes our life unnecessarily hard here. `jlong nanos` is already quite big for practical uses, and so we can simplify this code by only passing `nanos` around. In new revision, I still do two overloads to avoid "unit" mistakes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170390101 From shade at openjdk.org Tue Apr 18 17:59:14 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Apr 2023 17:59:14 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v2] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 29 Mar 2023 19:57:46 GMT, Alan Bateman wrote: >> Yes, let me fix that. `TimeUnit.toNanos` handles it well itself, it seems. > >> Yes, let me fix that. `TimeUnit.toNanos` handles it well itself, it seems. > > This code is refactored in PR 13203 so we'll have to merge at some point. Merged! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170390739 From kbarrett at openjdk.org Tue Apr 18 19:02:50 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 18 Apr 2023 19:02:50 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 17:09:44 GMT, Afshin Zafari wrote: > - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. > > - The `-fcheck-new` is removed from the gcc compile flags. > > - The `operator new` and `operator delete` are deleted from `StackObj`. > > - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. > - The `Thread::operator new`with and without `null` return are removed. > > ### Tests > local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 > mach5: tiers 1-5 Changes requested by kbarrett (Reviewer). src/hotspot/share/jfr/utilities/jfrAllocation.hpp line 58: > 56: NOINLINE void* operator new(size_t size); > 57: NOINLINE void* operator new (size_t size, const std::nothrow_t& nothrow_constant) throw(); > 58: NOINLINE void* operator new [](size_t size); The changes to JfrCHeapObj are not correct, because these allocators currently _can_ return null. Their implementation is just to return the result of calling the non-throwing allocator. That's probably not an ideal implementation. Either the declaration needs to be left as-is or the implementation changed. src/hotspot/share/memory/allocation.hpp line 287: > 285: private: > 286: void* operator new(size_t size) throw() = delete; > 287: void* operator new [](size_t size) throw() = delete; The lingering nothrow exception-specs here are just clutter and can be removed. src/hotspot/share/memory/allocation.hpp line 289: > 287: void* operator new [](size_t size) throw() = delete; > 288: void operator delete(void* p) = delete; > 289: void operator delete [](void* p) = delete; Making these deleted functions public might provide better error messages if someone accidentally attempts to reference them. src/hotspot/share/memory/allocation.hpp line 504: > 502: // Arena allocations > 503: void* operator new(size_t size, Arena *arena); > 504: void* operator new [](size_t size, Arena *arena) = delete; `operator new[](size_t)` (down below, where github won't let me comment directly) should also have it's nothrow exception-spec removed. src/hotspot/share/prims/jvmtiRawMonitor.hpp line 114: > 112: > 113: // Non-aborting operator new > 114: void* operator new(size_t size) { This change is incorrect, as this can quite obviously return null. And that seems to be intentional. Presumably the callers are checking for a possible null allocation result (else there is a bug). I think it would be less confusing if this took a `std::nothrow_t` to be explicit about it's behavior, and updated the caller(s) accordingly. That would match the usual idiom. ------------- PR Review: https://git.openjdk.org/jdk/pull/13498#pullrequestreview-1390703118 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1170425313 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1170429604 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1170428457 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1170434730 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1170438594 From kbarrett at openjdk.org Tue Apr 18 19:02:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 18 Apr 2023 19:02:53 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new In-Reply-To: References: Message-ID: <-cM0_BrloWCZxEYY8rbloXrXe1_mcQscU3ghJc1TE2I=.94b5de60-cfc8-4263-b37e-dae00e4577bc@github.com> On Tue, 18 Apr 2023 15:18:34 GMT, Coleen Phillimore wrote: >> - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. >> >> - The `-fcheck-new` is removed from the gcc compile flags. >> >> - The `operator new` and `operator delete` are deleted from `StackObj`. >> >> - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. >> - The `Thread::operator new`with and without `null` return are removed. >> >> ### Tests >> local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 >> mach5: tiers 1-5 > > src/hotspot/share/runtime/thread.hpp line 203: > >> 201: static bool is_JavaThread_protected_by_TLH(const JavaThread* target); >> 202: >> 203: void operator delete(void* p); > > Should you have removed delete and Thread::allocate() also? is Thread::allocate now unused? I was thinking the same thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1170457293 From kbarrett at openjdk.org Tue Apr 18 19:34:44 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 18 Apr 2023 19:34:44 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ [v2] In-Reply-To: <9LxTmPMhjcrQHqTyg-RDtbHd_bQZcoHpxF7ji9qygak=.d96ae6cf-146a-4d9e-9b44-cf8c7f38348f@github.com> References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> <9LxTmPMhjcrQHqTyg-RDtbHd_bQZcoHpxF7ji9qygak=.d96ae6cf-146a-4d9e-9b44-cf8c7f38348f@github.com> Message-ID: On Tue, 18 Apr 2023 14:49:17 GMT, Johan Sj?len wrote: > Hm, got some funny errors: > > ``` > > === Output from failing command(s) repeated here === > * For target hotspot_variant-server_libjvm_objs_filemap.o: > /home/runner/work/jdk/jdk/src/hotspot/share/cds/filemap.cpp: In member function 'void FileMapInfo::seek_to_position(size_t)': > /home/runner/work/jdk/jdk/src/hotspot/share/cds/filemap.cpp:1429:50: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] > 1429 | log_error(cds)("Unable to seek to position %ld", pos); > | ~~^ ~~~ > | | | > | | size_t {aka unsigned int} > | long int > | %d > cc1plus: all warnings being treated as errors > ``` > > I'll have to look into the merge and see if I messed that up. Worst case, I'll re-apply the nullptr conversion to origin/master. That seems to be https://bugs.openjdk.org/browse/JDK-8306289, which has been fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12248#issuecomment-1513691025 From shade at openjdk.org Tue Apr 18 19:38:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Apr 2023 19:38:12 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix gtests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13225/files - new: https://git.openjdk.org/jdk/pull/13225/files/f4818237..33fa34f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From kbarrett at openjdk.org Tue Apr 18 19:55:47 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 18 Apr 2023 19:55:47 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ [v2] In-Reply-To: References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Message-ID: On Tue, 18 Apr 2023 11:41:53 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Missed fix Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12248#pullrequestreview-1390833802 From iklam at openjdk.org Tue Apr 18 20:16:06 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Apr 2023 20:16:06 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v7] In-Reply-To: References: Message-ID: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into 8298048-combine-cds-heap-to-single-region-PUSH - Fixed assert in runtime/cds/appcds/SharedArchiveConsistency.java - Removal of JFR custom closed/open archive region types - Remove g1 full gc skip marking optimization - Some comment updates - Move g1collectedheap archive related regions together in the cpp file - Factor out region/range iteration - Fix comment - Ioi fix - fixed merge - ... and 11 more: https://git.openjdk.org/jdk/compare/0f3828dd...8a35c7ee ------------- Changes: https://git.openjdk.org/jdk/pull/13284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=06 Stats: 3252 lines in 83 files changed: 159 ins; 2446 del; 647 mod Patch: https://git.openjdk.org/jdk/pull/13284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13284/head:pull/13284 PR: https://git.openjdk.org/jdk/pull/13284 From iklam at openjdk.org Tue Apr 18 21:44:41 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Apr 2023 21:44:41 GMT Subject: RFR: 8306310: Move is_shared Klass flag In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 17:06:03 GMT, Coleen Phillimore wrote: > Please review this simple patch to move the is_shared_class flag to Klass, CDS flags. The eventual goal is to have AccessFlags only be ones defined in the classfile. > Tested with tier1-4. LGTM. Thanks for the clean up. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13514#pullrequestreview-1390975675 From iklam at openjdk.org Tue Apr 18 21:47:44 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Apr 2023 21:47:44 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock In-Reply-To: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> Message-ID: On Mon, 3 Apr 2023 19:33:27 GMT, Coleen Phillimore wrote: > This patch releases the InvokeMethodTable_lock while creating a method handle intrinsic. If there's a race, it frees a Method created by racing thread. The logic is simple but uses the deallocate_list infrastructure that's mostly used for redefinition making it less rare. With Dacapo2009, this adds about 20 Methods + constant pools to the list. Also the method has to call nmethod->flush which is assumed to be something only GC calls. > > Tested with tier1-4. Looks reasonable to me. Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13308#pullrequestreview-1390978585 PR Review: https://git.openjdk.org/jdk/pull/13308#pullrequestreview-1390978766 From iklam at openjdk.org Tue Apr 18 21:54:44 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Apr 2023 21:54:44 GMT Subject: RFR: 8306123: Move InstanceKlass writeable flags In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 17:11:25 GMT, Coleen Phillimore wrote: > Please review this patch to move the writeable Klass AccessFlags to InstanceKlassFlags. > Tested with tier1-4. LGTM. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13515#pullrequestreview-1390984891 From dholmes at openjdk.org Wed Apr 19 02:20:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 02:20:45 GMT Subject: RFR: 8306278: jvmtiAgentList.cpp:253 assert(offset >= 0) failed: invariant occurs on AIX after JDK-8257967 In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 16:59:29 GMT, Markus Gr?nlund wrote: > Greetings, > > For most platforms, os::dll_address_to_library_name() only sets offset = -1 in case of errors. If there is an error, the function returns false. This is fine. > > On AIX, the offset, being optional, is invariantly set to -1, even in the case of non-errors. > > Easiest to remove the assertion for a positive offset. > > Thanks > Markus Removing the assertion seems the best course of action. It is interesting to note that the two call sites for these functions that pass `offset` do not correctly handle a return of -1 (it is only printed as an informational value but it isnb't printed in a suitable form). Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13513#pullrequestreview-1391171295 From dholmes at openjdk.org Wed Apr 19 02:25:54 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 02:25:54 GMT Subject: RFR: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 [v3] In-Reply-To: <7Mtu4sDUSEqgO7yYhYPVT-aU0XmHOObSRnPOQqDcIic=.443775c9-0d30-4c6f-9815-0133f57a2e00@github.com> References: <7Mtu4sDUSEqgO7yYhYPVT-aU0XmHOObSRnPOQqDcIic=.443775c9-0d30-4c6f-9815-0133f57a2e00@github.com> Message-ID: <8bWAmer1c40K83zNOqk4t96rJ8xPG-lVyjBXpeGTp2M=.a4fead0c-23c2-4575-b5f7-5782c874fa5d@github.com> On Tue, 18 Apr 2023 15:22:02 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> With [JDK-8257967](https://bugs.openjdk.org/browse/JDK-8257967), much refactoring was done to the JVMTI code concerning agents. However, some platforms do not have JVMTI support, and tier5 of testing builds an embedded build, linux-arm32-open-cmp-baseline, which failed because the refactoring did not properly handle conditional compilations for JVMTI. >> >> JDK-8257967 did run tier5, but it used an existing build, so it did not cause recompilations of the embedded target :-( >> >> This changeset adds the conditional constructs to let linux-arm32-open-cmp-baseline build successfully. >> >> It does not look good, but there you go... >> >> Testing: >> >> Building: linux-arm32-open-cmp-baseline >> Building: regular platforms >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > SIZE_FORMAT @mgronlun I agree this does not look good. I'm not sure this was the right way to conditionalize the new code, rather than ensuring the callsites were conditionalized on INCLUDE_JVMTI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13512#issuecomment-1514041621 From darcy at openjdk.org Wed Apr 19 02:52:46 2023 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 19 Apr 2023 02:52:46 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v9] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 07:27:47 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - added breadcrumb in AnnotationParser about considering JVMCI should new annotation element types be added > - fixed javadoc comment Marked as reviewed by darcy (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/12810#pullrequestreview-1391189154 From dholmes at openjdk.org Wed Apr 19 06:17:52 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 06:17:52 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> On Tue, 18 Apr 2023 19:38:12 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fix gtests You seemed to have missed: ./cpu/x86/rdtsc_x86.cpp: JavaThread::current()->sleep(FT_SLEEP_MILLISECS); ./share/compiler/compileBroker.cpp: sleep(DeoptimizeObjectsALotInterval); so not sure how this is building ??? A few other comments below. Thanks src/hotspot/os/posix/os_posix.cpp line 1545: > 1543: > 1544: int PlatformEvent::park_nanos(jlong nanos) { > 1545: assert(0 <= nanos, "nanos are in range"); `nanos` should never be zero else you call the untimed park. src/hotspot/os/posix/park_posix.hpp line 57: > 55: void park(); > 56: int park_millis(jlong millis); > 57: int park_nanos(jlong nanos); Still not sure we need this API split but if we keep `park(jlong millis)` and just add `park_nanos(jlong nanos)` then you can avoid touching so many places in the code. src/hotspot/os/windows/os_windows.cpp line 5253: > 5251: > 5252: int PlatformEvent::park_nanos(jlong nanos) { > 5253: assert(0 <= nanos, "nanos are in range"); `nanos` should never be zero else you call the untimed park. src/hotspot/os/windows/os_windows.cpp line 5257: > 5255: // Windows timers are still quite unpredictable to handle sub-millisecond granularity. > 5256: // Instead of implementing this method, fall back to the millisecond sleep, treating > 5257: // any positive requested nanos as a full millisecond. Is this how the code currently works? src/hotspot/os/windows/os_windows.cpp line 5259: > 5257: // any positive requested nanos as a full millisecond. > 5258: jlong millis = align_up(nanos, NANOSECS_PER_MILLISEC) / NANOSECS_PER_MILLISEC; > 5259: assert(nanos == 0 || millis != 0, "Only pass zero millis on zero nanos"); Not sure what this is trying to check. Nit: s/on/or/ src/hotspot/share/runtime/javaThread.hpp line 1145: > 1143: public: > 1144: bool sleep_millis(jlong millis); > 1145: bool sleep_nanos(jlong nanos); I prefer just one sleep that takes nanos. src/java.base/share/classes/java/lang/Thread.java line 576: > 574: long millis = NANOSECONDS.toMillis(nanos); > 575: nanos -= MILLISECONDS.toNanos(millis); > 576: sleep(millis, (int)nanos); This double conversion seems a bit kludgy - why not just keep the vthread check and call `sleep0(nanos)`? test/jdk/java/lang/Thread/SleepSanity.java line 75: > 73: } > 74: > 75: private static void testTimes(TestCase t, long min, long max) throws Exception { Not obvious without reading all the code that min and max are in ms. `msMin` and `msMax` might be clearer. test/jdk/java/lang/Thread/SleepSanity.java line 84: > 82: } > 83: > 84: private static void testTimeout(TestCase t, long timeout) throws Exception { Suggestion: s/timeout/millis/ again so unit is clear. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13225#pullrequestreview-1391285246 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170851742 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170827783 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170840798 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170824754 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170841030 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170846464 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170831695 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170835700 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170835903 From dholmes at openjdk.org Wed Apr 19 06:17:54 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 06:17:54 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 05:31:08 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix gtests > > src/hotspot/os/windows/os_windows.cpp line 5257: > >> 5255: // Windows timers are still quite unpredictable to handle sub-millisecond granularity. >> 5256: // Instead of implementing this method, fall back to the millisecond sleep, treating >> 5257: // any positive requested nanos as a full millisecond. > > Is this how the code currently works? Hmmm I changed that comment ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170848229 From dholmes at openjdk.org Wed Apr 19 06:17:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 06:17:55 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 06:09:28 GMT, David Holmes wrote: >> src/hotspot/os/windows/os_windows.cpp line 5257: >> >>> 5255: // Windows timers are still quite unpredictable to handle sub-millisecond granularity. >>> 5256: // Instead of implementing this method, fall back to the millisecond sleep, treating >>> 5257: // any positive requested nanos as a full millisecond. >> >> Is this how the code currently works? > > Hmmm I changed that comment ... My actual comment was changed to: > I suggest extending the comment to add This is how Thread.sleep(millis, nanos) has always behaved with only millisecond granularity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170850062 From jwaters at openjdk.org Wed Apr 19 07:06:49 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 19 Apr 2023 07:06:49 GMT Subject: RFR: 8250269: Replace ATTRIBUTE_ALIGNED with alignas [v15] In-Reply-To: References: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> Message-ID: On Wed, 12 Apr 2023 01:36:01 GMT, Kim Barrett wrote: > I've been meaning to review this but have been swamped. Sorry. > > I don't think this change to HotSpot should be combined with JDK-8305341 / PR#13258. > > I'm concerned there might be uses of ATTRIBUTE_ALIGNED in other places than at the front of the declaration (like the fixed offset_of macro in the proposed changes). Obviously there aren't any that break compilation. But is alignas in other places valid but with a different meaning? For a discussion of the kind of thing I'm concerned about, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108796 I get what you mean, I'll search HotSpot's codebase to see whether this issue pops up in any of our existing code. It's a little tempting to just replace every instance of this macro with alignas, but I guess I'll avoid doing that to keep the changes minimal ------------- PR Comment: https://git.openjdk.org/jdk/pull/11431#issuecomment-1514230350 From mbaesken at openjdk.org Wed Apr 19 07:12:41 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 19 Apr 2023 07:12:41 GMT Subject: RFR: 8306278: jvmtiAgentList.cpp:253 assert(offset >= 0) failed: invariant occurs on AIX after JDK-8257967 In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 16:59:29 GMT, Markus Gr?nlund wrote: > Greetings, > > For most platforms, os::dll_address_to_library_name() only sets offset = -1 in case of errors. If there is an error, the function returns false. This is fine. > > On AIX, the offset, being optional, is invariantly set to -1, even in the case of non-errors. > > Easiest to remove the assertion for a positive offset. > > Thanks > Markus Thanks, looks good to me. One of my colleague has an idea to improve os::dll_address_to_library_name on AIX to support the offsets but this is something for the future so still good to have your fix. ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13513#pullrequestreview-1391391403 From dholmes at openjdk.org Wed Apr 19 07:30:46 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 07:30:46 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock In-Reply-To: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> Message-ID: On Mon, 3 Apr 2023 19:33:27 GMT, Coleen Phillimore wrote: > Also the method has to call nmethod->flush which is assumed to be something only GC calls. I'm not familiar with this code but this raises some concerns for me. Is there a way we can test this deallocation path? Other comments below. Thanks. src/hotspot/share/oops/method.cpp line 138: > 136: MetadataFactory::free_metadata(loader_data, method_counters()); > 137: clear_method_counters(); > 138: // The nmethod will be gone when we get here, for redefinition but not Nit: the comma should be after redefinition. But is that the only reason to get here? What about regular class unloading? The method comment on lines 129/130 also needs updating. src/hotspot/share/oops/method.cpp line 141: > 139: // for method handle intrinsics. > 140: if (code() != nullptr) { > 141: ((nmethod*)_code)->flush(); `flush` acquires the CodeCache_lock - could that be an issue in this call path? ------------- PR Review: https://git.openjdk.org/jdk/pull/13308#pullrequestreview-1391344080 PR Review Comment: https://git.openjdk.org/jdk/pull/13308#discussion_r1170866278 PR Review Comment: https://git.openjdk.org/jdk/pull/13308#discussion_r1170867580 From alanb at openjdk.org Wed Apr 19 08:18:52 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 19 Apr 2023 08:18:52 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 05:43:19 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix gtests > > src/java.base/share/classes/java/lang/Thread.java line 576: > >> 574: long millis = NANOSECONDS.toMillis(nanos); >> 575: nanos -= MILLISECONDS.toNanos(millis); >> 576: sleep(millis, (int)nanos); > > This double conversion seems a bit kludgy - why not just keep the vthread check and call `sleep0(nanos)`? Yes, I wondering that too as the method has the sleep time in nanos already so more readable to just call sleep0(nanos). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1170984132 From shade at openjdk.org Wed Apr 19 09:06:59 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 09:06:59 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 05:50:00 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix gtests > > test/jdk/java/lang/Thread/SleepSanity.java line 75: > >> 73: } >> 74: >> 75: private static void testTimes(TestCase t, long min, long max) throws Exception { > > Not obvious without reading all the code that min and max are in ms. `msMin` and `msMax` might be clearer. Done. > test/jdk/java/lang/Thread/SleepSanity.java line 84: > >> 82: } >> 83: >> 84: private static void testTimeout(TestCase t, long timeout) throws Exception { > > Suggestion: s/timeout/millis/ again so unit is clear. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171043726 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171043842 From shade at openjdk.org Wed Apr 19 09:22:44 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 09:22:44 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 08:15:44 GMT, Alan Bateman wrote: >> src/java.base/share/classes/java/lang/Thread.java line 576: >> >>> 574: long millis = NANOSECONDS.toMillis(nanos); >>> 575: nanos -= MILLISECONDS.toNanos(millis); >>> 576: sleep(millis, (int)nanos); >> >> This double conversion seems a bit kludgy - why not just keep the vthread check and call `sleep0(nanos)`? > > Yes, I wondering that too as the method has the sleep time in nanos already so more readable to just call sleep0(nanos). True. I was trying to merge some paths in `Thread.java` in the first revision of the patch, but after recent refactorings, we can just call into `sleep0(nanos)` and be done with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171065587 From shade at openjdk.org Wed Apr 19 09:30:56 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 09:30:56 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 06:12:02 GMT, David Holmes wrote: >> Hmmm I changed that comment ... > > My actual comment was changed to: >> I suggest extending the comment to add > > This is how Thread.sleep(millis, nanos) has always behaved with only millisecond granularity. Reworded, added comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171072783 From shade at openjdk.org Wed Apr 19 09:30:59 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 09:30:59 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 05:58:35 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix gtests > > src/hotspot/os/windows/os_windows.cpp line 5259: > >> 5257: // any positive requested nanos as a full millisecond. >> 5258: jlong millis = align_up(nanos, NANOSECS_PER_MILLISEC) / NANOSECS_PER_MILLISEC; >> 5259: assert(nanos == 0 || millis != 0, "Only pass zero millis on zero nanos"); > > Not sure what this is trying to check. > > Nit: s/on/or/ Right, I thought we are allowed to pass zero downstream, but there is a `guarantee(Millis > 0)`, which would fail. So there is no reason to do this. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171074176 From aturbanov at openjdk.org Wed Apr 19 09:30:59 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 19 Apr 2023 09:30:59 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v23] In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 06:36:51 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> ### Specdiff >> https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html >> >> ### Javadoc >> https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html >> >> ### Tests >> >> - [X] Tier1 >> - [X] Tier2 >> - [ ] Tier3 >> - [ ] Tier4 >> - [ ] Tier5 >> - [ ] Tier6 > > Per Minborg has updated the pull request incrementally with two additional commits since the last revision: > > - Merge pull request #3 from JornVernee/IsForeignLinkerSupported > > rename has_port > - rename has_port test/jdk/java/foreign/TestByteBuffer.java line 335: > 333: assertEquals(byteBuffer.isReadOnly(), segment.isReadOnly()); > 334: assertTrue(byteBuffer.isDirect()); > 335: } catch(IOException e) { nit Suggestion: } catch (IOException e) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1171075598 From alanb at openjdk.org Wed Apr 19 09:30:53 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 19 Apr 2023 09:30:53 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Tue, 18 Apr 2023 19:38:12 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fix gtests You might need to check that test/hotspot/jtreg/vmTestbase/nsk/jdwp/ThreadReference/OwnedMonitorsStackDepthInfo/ownedMonitorsStackDepthInfo001/ownedMonitorsStackDepthInfo001a.java is passing. I haven't tried your changes but I remember needing to change this test when doing experimental changes in this area. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1514418262 From tschatzl at openjdk.org Wed Apr 19 09:49:55 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Apr 2023 09:49:55 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ [v2] In-Reply-To: References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Message-ID: On Tue, 18 Apr 2023 11:41:53 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Missed fix There is one more `NULL` in HeapWord* G1CollectedHeap::humongous_obj_allocate_initialize_regions(HeapRegion* first_hr, uint num_regions, size_t word_size) { assert(first_hr != NULL, "pre-condition"); line 281. I do not seem to be able to anchor this suggestion in the github UI. Looks good otherwise. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12248#pullrequestreview-1391688739 From shade at openjdk.org Wed Apr 19 09:50:53 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 09:50:53 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 05:58:11 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix gtests > > src/hotspot/os/windows/os_windows.cpp line 5253: > >> 5251: >> 5252: int PlatformEvent::park_nanos(jlong nanos) { >> 5253: assert(0 <= nanos, "nanos are in range"); > > `nanos` should never be zero else you call the untimed park. Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171098893 From shade at openjdk.org Wed Apr 19 09:56:44 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 09:56:44 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v4] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request incrementally with six additional commits since the last revision: - Adjust assert - Replace (park|sleep)_millis back with just (park|sleep) - More review touchups - Revert some Thread refactorings - Add a few missing sleep_millis renames - Adjust the test a bit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13225/files - new: https://git.openjdk.org/jdk/pull/13225/files/33fa34f1..f78aef54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=02-03 Stats: 57 lines in 16 files changed: 23 ins; 2 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From shade at openjdk.org Wed Apr 19 09:56:50 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 09:56:50 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 06:14:37 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix gtests > > src/hotspot/os/posix/os_posix.cpp line 1545: > >> 1543: >> 1544: int PlatformEvent::park_nanos(jlong nanos) { >> 1545: assert(0 <= nanos, "nanos are in range"); > > `nanos` should never be zero else you call the untimed park. OK, I see how is that guaranteed in the Windows case. In POSIX case, calling `park()` is untimed wait, but `park(0)` is converted to absolute time that is already passed, and so `pthread_cond_timedwait` would return immediately, right? So `park(0)` is not equivalent to just `park()`? Still, the strongest behavior from Windows case takes precedence here. Changed the assert. > src/hotspot/os/posix/park_posix.hpp line 57: > >> 55: void park(); >> 56: int park_millis(jlong millis); >> 57: int park_nanos(jlong nanos); > > Still not sure we need this API split but if we keep `park(jlong millis)` and just add `park_nanos(jlong nanos)` then you can avoid touching so many places in the code. I thought the exposure to `park` -> `park_millis` renames would be smaller, but apparently there is a considerable number of uses. I left `park(millis)` (old) and added `park_nanos(nanos)` (new), and reverted `park_millis` changes. > src/hotspot/share/runtime/javaThread.hpp line 1145: > >> 1143: public: >> 1144: bool sleep_millis(jlong millis); >> 1145: bool sleep_nanos(jlong nanos); > > I prefer just one sleep that takes nanos. If we do only `sleep(jlong nanos)`, then there is an accident waiting to happen, when some unfixed code would call `sleep` with `millis` argument, not knowing it is now `nanos`. That was the reason why I made the names explicit. `sleep_millis` also does the conversion to nanos that does not overflow. But, like with `park` above, I think there is an argument to keep `sleep(millis)` and add `sleep_nanos(nanos)`, to keep code changes at minimum. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171103473 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171103664 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171103560 From shade at openjdk.org Wed Apr 19 10:02:46 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 10:02:46 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 19 Apr 2023 09:27:49 GMT, Alan Bateman wrote: > You might need to check that test/hotspot/jtreg/vmTestbase/nsk/jdwp/ThreadReference/OwnedMonitorsStackDepthInfo/ownedMonitorsStackDepthInfo001/ownedMonitorsStackDepthInfo001a.java is passing. I haven't tried your changes but I remember needing to change this test when doing experimental changes in this area. Oh, so that test checks the owned monitors are in frames at particular stack depths! So my previous PR breaks that test because it makes a delegated call to `sleep` in `java.lang.Thread`: all actual offsets are off by one. New PR drops that delegation and the entirety of `vmTestbase/nsk/jdwp/ThreadReference` now passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1514461529 From shade at openjdk.org Wed Apr 19 10:13:01 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 10:13:01 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 06:07:04 GMT, David Holmes wrote: > You seemed to have missed: > > ``` > ./cpu/x86/rdtsc_x86.cpp: JavaThread::current()->sleep(FT_SLEEP_MILLISECS); > ./share/compiler/compileBroker.cpp: sleep(DeoptimizeObjectsALotInterval); > ``` > > so not sure how this is building ??? Should not code when tired, having only Mac M1 on my hands! Argh. Fun fact: at least on Darwin, we get standard `sleep` in `CompileBroker` if we rename `JavaThread::sleep` to something else. I think we would need to fully qualify those uses to avoid this trap in future -- I'll do a separate PR for this. Other comments should be addressed in the series of new commits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1514476031 From duke at openjdk.org Wed Apr 19 10:25:49 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 19 Apr 2023 10:25:49 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: References: Message-ID: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> > - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. > > - The `-fcheck-new` is removed from the gcc compile flags. > > - The `operator new` and `operator delete` are deleted from `StackObj`. > > - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. > - The `Thread::operator new`with and without `null` return are removed. > > ### Tests > local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8305590: Remove nothrow exception specifications from operator new ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13498/files - new: https://git.openjdk.org/jdk/pull/13498/files/45a4e5de..d2d75e7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13498&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13498&range=00-01 Stats: 24 lines in 6 files changed: 0 ins; 14 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13498/head:pull/13498 PR: https://git.openjdk.org/jdk/pull/13498 From duke at openjdk.org Wed Apr 19 10:25:53 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 19 Apr 2023 10:25:53 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 18:29:13 GMT, Kim Barrett wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8305590: Remove nothrow exception specifications from operator new > > src/hotspot/share/memory/allocation.hpp line 289: > >> 287: void* operator new [](size_t size) throw() = delete; >> 288: void operator delete(void* p) = delete; >> 289: void operator delete [](void* p) = delete; > > Making these deleted functions public might provide better error messages if someone accidentally attempts to reference them. Done. > src/hotspot/share/memory/allocation.hpp line 504: > >> 502: // Arena allocations >> 503: void* operator new(size_t size, Arena *arena); >> 504: void* operator new [](size_t size, Arena *arena) = delete; > > `operator new[](size_t)` (down below, where github won't let me comment directly) should also have it's nothrow exception-spec removed. Done. > src/hotspot/share/prims/jvmtiRawMonitor.hpp line 114: > >> 112: >> 113: // Non-aborting operator new >> 114: void* operator new(size_t size) { > > This change is incorrect, as this can quite obviously return null. And that seems to be intentional. > Presumably the callers are checking for a possible null allocation result (else there is a bug). I think > it would be less confusing if this took a `std::nothrow_t` to be explicit about it's behavior, and updated > the caller(s) accordingly. That would match the usual idiom. no throw argument is added to the declaration and the caller is changed accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1171133732 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1171134794 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1171136374 From duke at openjdk.org Wed Apr 19 10:25:56 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 19 Apr 2023 10:25:56 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: <-cM0_BrloWCZxEYY8rbloXrXe1_mcQscU3ghJc1TE2I=.94b5de60-cfc8-4263-b37e-dae00e4577bc@github.com> References: <-cM0_BrloWCZxEYY8rbloXrXe1_mcQscU3ghJc1TE2I=.94b5de60-cfc8-4263-b37e-dae00e4577bc@github.com> Message-ID: On Tue, 18 Apr 2023 18:59:31 GMT, Kim Barrett wrote: >> src/hotspot/share/runtime/thread.hpp line 203: >> >>> 201: static bool is_JavaThread_protected_by_TLH(const JavaThread* target); >>> 202: >>> 203: void operator delete(void* p); >> >> Should you have removed delete and Thread::allocate() also? is Thread::allocate now unused? > > I was thinking the same thing. Removed and tested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1171133528 From tschatzl at openjdk.org Wed Apr 19 10:47:51 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Apr 2023 10:47:51 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v7] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 20:16:06 GMT, Ioi Lam wrote: >> This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. >> >> **Notes for reviewers:** >> - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. >> - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). >> - It might be easier to see the diff with whitespaces off. >> - There are two major changes in the G1 code >> - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) >> - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) >> - Testing changes: >> - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. >> - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. >> >> **Testing:** >> - Mach5 tiers 1 ~ 7 > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into 8298048-combine-cds-heap-to-single-region-PUSH > - Fixed assert in runtime/cds/appcds/SharedArchiveConsistency.java > - Removal of JFR custom closed/open archive region types > - Remove g1 full gc skip marking optimization > - Some comment updates > - Move g1collectedheap archive related regions together in the cpp file > - Factor out region/range iteration > - Fix comment > - Ioi fix > - fixed merge > - ... and 11 more: https://git.openjdk.org/jdk/compare/0f3828dd...8a35c7ee GC changes seem good. Only cursorily looked at runtime/CDS changes. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13284#pullrequestreview-1391786221 From mgronlun at openjdk.org Wed Apr 19 10:59:44 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 19 Apr 2023 10:59:44 GMT Subject: RFR: 8306278: jvmtiAgentList.cpp:253 assert(offset >= 0) failed: invariant occurs on AIX after JDK-8257967 In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 17:15:46 GMT, Serguei Spitsyn wrote: >> Greetings, >> >> For most platforms, os::dll_address_to_library_name() only sets offset = -1 in case of errors. If there is an error, the function returns false. This is fine. >> >> On AIX, the offset, being optional, is invariantly set to -1, even in the case of non-errors. >> >> Easiest to remove the assertion for a positive offset. >> >> Thanks >> Markus > > Looks good to me. > Thanks, > Serguei Thank you @sspitsyn, @dholmes-ora and @MBaesken for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13513#issuecomment-1514533473 From mgronlun at openjdk.org Wed Apr 19 11:02:54 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 19 Apr 2023 11:02:54 GMT Subject: Integrated: 8306278: jvmtiAgentList.cpp:253 assert(offset >= 0) failed: invariant occurs on AIX after JDK-8257967 In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 16:59:29 GMT, Markus Gr?nlund wrote: > Greetings, > > For most platforms, os::dll_address_to_library_name() only sets offset = -1 in case of errors. If there is an error, the function returns false. This is fine. > > On AIX, the offset, being optional, is invariantly set to -1, even in the case of non-errors. > > Easiest to remove the assertion for a positive offset. > > Thanks > Markus This pull request has now been integrated. Changeset: c738c8ea Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/c738c8ea3e9fda87abb03acb599a2433a344db09 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8306278: jvmtiAgentList.cpp:253 assert(offset >= 0) failed: invariant occurs on AIX after JDK-8257967 Reviewed-by: sspitsyn, dholmes, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/13513 From mgronlun at openjdk.org Wed Apr 19 11:11:56 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 19 Apr 2023 11:11:56 GMT Subject: RFR: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 [v3] In-Reply-To: <8bWAmer1c40K83zNOqk4t96rJ8xPG-lVyjBXpeGTp2M=.a4fead0c-23c2-4575-b5f7-5782c874fa5d@github.com> References: <7Mtu4sDUSEqgO7yYhYPVT-aU0XmHOObSRnPOQqDcIic=.443775c9-0d30-4c6f-9815-0133f57a2e00@github.com> <8bWAmer1c40K83zNOqk4t96rJ8xPG-lVyjBXpeGTp2M=.a4fead0c-23c2-4575-b5f7-5782c874fa5d@github.com> Message-ID: On Wed, 19 Apr 2023 02:22:38 GMT, David Holmes wrote: > @mgronlun I agree this does not look good. I'm not sure this was the right way to conditionalize the new code, rather than ensuring the callsites were conditionalized on INCLUDE_JVMTI. It follows the same pattern as for other jvmti*.cpp files also excluded via make/hotspot/lib/jvmFeatures.gmk. For example, jvmtiExport.cpp. Might be better to improve with conditionalized callsites, agree. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13512#issuecomment-1514547771 From kbarrett at openjdk.org Wed Apr 19 11:13:49 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 19 Apr 2023 11:13:49 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> References: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> Message-ID: On Wed, 19 Apr 2023 10:25:49 GMT, Afshin Zafari wrote: >> - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. >> >> - The `-fcheck-new` is removed from the gcc compile flags. >> >> - The `operator new` and `operator delete` are deleted from `StackObj`. >> >> - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. >> - The `Thread::operator new`with and without `null` return are removed. >> >> ### Tests >> local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8305590: Remove nothrow exception specifications from operator new src/hotspot/share/prims/jvmtiRawMonitor.hpp line 114: > 112: > 113: // Non-aborting operator new > 114: void* operator new(size_t size, const std::nothrow_t& nothrow_constant) throw() { Hm, now I'm wondering why isn't an `operator delete` to go with this? Or are these objects never deleted? Otherwise I'd have thought we'd get the same mismatched new/delete warning you encountered elsewhere. If they're never supposed to be deleted, then giving `operator delete` a deleted definition here seems appropriate, to prevent accidentally calling the CHeapObj function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1171175550 From dholmes at openjdk.org Wed Apr 19 11:57:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 11:57:45 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> References: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> Message-ID: On Wed, 19 Apr 2023 10:25:49 GMT, Afshin Zafari wrote: >> - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. >> >> - The `-fcheck-new` is removed from the gcc compile flags. >> >> - The `operator new` and `operator delete` are deleted from `StackObj`. >> >> - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. >> - The `Thread::operator new`with and without `null` return are removed. >> >> ### Tests >> local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8305590: Remove nothrow exception specifications from operator new src/hotspot/share/jfr/utilities/jfrAllocation.hpp line 2: > 1: /* > 2: * Copyright (c) 2014, 2023, Oracle and/or its affiliates. All rights reserved. This appears to be the only change to this file so should be reverted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1171223970 From coleenp at openjdk.org Wed Apr 19 12:30:30 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 12:30:30 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock [v2] In-Reply-To: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> Message-ID: > This patch releases the InvokeMethodTable_lock while creating a method handle intrinsic. If there's a race, it frees a Method created by racing thread. The logic is simple but uses the deallocate_list infrastructure that's mostly used for redefinition making it less rare. With Dacapo2009, this adds about 20 Methods + constant pools to the list. Also the method has to call nmethod->flush which is assumed to be something only GC calls. > > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Update method deallocate_contents comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13308/files - new: https://git.openjdk.org/jdk/pull/13308/files/ed40ee56..7cb29b91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13308&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13308&range=00-01 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13308/head:pull/13308 PR: https://git.openjdk.org/jdk/pull/13308 From coleenp at openjdk.org Wed Apr 19 12:31:52 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 12:31:52 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock [v2] In-Reply-To: References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> Message-ID: On Wed, 19 Apr 2023 06:33:12 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Update method deallocate_contents comments. > > src/hotspot/share/oops/method.cpp line 138: > >> 136: MetadataFactory::free_metadata(loader_data, method_counters()); >> 137: clear_method_counters(); >> 138: // The nmethod will be gone when we get here, for redefinition but not > > Nit: the comma should be after redefinition. But is that the only reason to get here? What about regular class unloading? > > The method comment on lines 129/130 also needs updating. I updated the comment. Regular class unloading does not get here. The Metaspace containing the Method is released, not requiring individual metadata deallocation. This is tested with normal execution when the race to create method handle intrinsic Method is lost. I counted ~20 with logging for Dacapo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13308#discussion_r1171266019 From fparain at openjdk.org Wed Apr 19 13:02:43 2023 From: fparain at openjdk.org (Frederic Parain) Date: Wed, 19 Apr 2023 13:02:43 GMT Subject: RFR: 8306310: Move is_shared Klass flag In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 17:06:03 GMT, Coleen Phillimore wrote: > Please review this simple patch to move the is_shared_class flag to Klass, CDS flags. The eventual goal is to have AccessFlags only be ones defined in the classfile. > Tested with tier1-4. Looks good to me. Thank you for making AccessFlags even cleaner. ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13514#pullrequestreview-1392010399 From dholmes at openjdk.org Wed Apr 19 13:08:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 13:08:53 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v4] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 19 Apr 2023 09:56:44 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with six additional commits since the last revision: > > - Adjust assert > - Replace (park|sleep)_millis back with just (park|sleep) > - More review touchups > - Revert some Thread refactorings > - Add a few missing sleep_millis renames > - Adjust the test a bit Further to Alan's comment about checking tests, I think this may also impact the strace00n tests that are currently being fixed by https://github.com/openjdk/jdk/pull/13476 - the changes in Thread.java may change the maximum stack depth. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1514702679 From fparain at openjdk.org Wed Apr 19 13:09:44 2023 From: fparain at openjdk.org (Frederic Parain) Date: Wed, 19 Apr 2023 13:09:44 GMT Subject: RFR: 8306123: Move InstanceKlass writeable flags In-Reply-To: References: Message-ID: <0eCwdxDqCvbPfBVrzu-CXBQY-Ob8k5y_TOqSH2pA34w=.9b423d4f-7766-4610-b96f-3842a91915b1@github.com> On Tue, 18 Apr 2023 17:11:25 GMT, Coleen Phillimore wrote: > Please review this patch to move the writeable Klass AccessFlags to InstanceKlassFlags. > Tested with tier1-4. Nice clean up. Looks good to me. ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13515#pullrequestreview-1392028279 From dholmes at openjdk.org Wed Apr 19 13:17:48 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Apr 2023 13:17:48 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v3] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <4mZ9m57CMeqwSLbTwnxfjs9g_dvI4bPEHcYgbND02xQ=.c9585bb0-8436-4d99-a862-4295465190be@github.com> Message-ID: On Wed, 19 Apr 2023 09:51:48 GMT, Aleksey Shipilev wrote: >> src/hotspot/os/posix/os_posix.cpp line 1545: >> >>> 1543: >>> 1544: int PlatformEvent::park_nanos(jlong nanos) { >>> 1545: assert(0 <= nanos, "nanos are in range"); >> >> `nanos` should never be zero else you call the untimed park. > > OK, I see how is that guaranteed in the Windows case. In POSIX case, calling `park()` is untimed wait, but `park(0)` is converted to absolute time that is already passed, and so `pthread_cond_timedwait` would return immediately, right? So `park(0)` is not equivalent to just `park()`? Still, the strongest behavior from Windows case takes precedence here. Changed the assert. Posix is missing the assertion that Windows has, but if you check the callers you will find we never pass 0. The typical pattern is: if (millis <= 0) { self->_ParkEvent->park(); } else { self->_ParkEvent->park(millis); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171323075 From pminborg at openjdk.org Wed Apr 19 13:37:15 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 19 Apr 2023 13:37:15 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v24] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > ### Specdiff > https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html > > ### Javadoc > https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html > > ### Tests > > - [X] Tier1 > - [X] Tier2 > - [ ] Tier3 > - [ ] Tier4 > - [ ] Tier5 > - [ ] Tier6 Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: - Merge branch 'master' into PR_21_V2 - Update test/jdk/java/foreign/TestByteBuffer.java Co-authored-by: Andrey Turbanov - Merge pull request #3 from JornVernee/IsForeignLinkerSupported rename has_port - rename has_port - Merge pull request #2 from JornVernee/WSL_BB account for missing functional in WSL in TestByteBuffer - account for missing mincore on WSL in TestByteBuffer - Merge branch 'master' into PR_21_V2 - 8305369: Issues in zero-length memory segment javadoc section - 8305087: MemoryLayout API checks should be more eager - Merge master - ... and 32 more: https://git.openjdk.org/jdk/compare/9fb53adf...ba04f5cc ------------- Changes: https://git.openjdk.org/jdk/pull/13079/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=23 Stats: 13421 lines in 270 files changed: 5102 ins; 6182 del; 2137 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From shade at openjdk.org Wed Apr 19 13:42:50 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 13:42:50 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v4] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 19 Apr 2023 09:56:44 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with six additional commits since the last revision: > > - Adjust assert > - Replace (park|sleep)_millis back with just (park|sleep) > - More review touchups > - Revert some Thread refactorings > - Add a few missing sleep_millis renames > - Adjust the test a bit > Further to Alan's comment about checking tests, I think this may also impact the strace00n tests that are currently being fixed by #13476 - the changes in Thread.java may change the maximum stack depth. I think current PR does not break Java stack tests anymore, because it does not change the depth of Java stacks. I ran `vmTestbase/nsk/monitoring/stress/thread` without problems with this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1514756102 From coleenp at openjdk.org Wed Apr 19 14:06:46 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 14:06:46 GMT Subject: RFR: 8306123: Move InstanceKlass writeable flags In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 17:11:25 GMT, Coleen Phillimore wrote: > Please review this patch to move the writeable Klass AccessFlags to InstanceKlassFlags. > Tested with tier1-4. Thanks Ioi and Fred. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13515#issuecomment-1514794910 From coleenp at openjdk.org Wed Apr 19 14:10:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 14:10:12 GMT Subject: RFR: 8306310: Move is_shared Klass flag In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 17:06:03 GMT, Coleen Phillimore wrote: > Please review this simple patch to move the is_shared_class flag to Klass, CDS flags. The eventual goal is to have AccessFlags only be ones defined in the classfile. > Tested with tier1-4. Thanks Ioi and Fred. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13514#issuecomment-1514794305 From coleenp at openjdk.org Wed Apr 19 14:10:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 14:10:12 GMT Subject: Integrated: 8306123: Move InstanceKlass writeable flags In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 17:11:25 GMT, Coleen Phillimore wrote: > Please review this patch to move the writeable Klass AccessFlags to InstanceKlassFlags. > Tested with tier1-4. This pull request has now been integrated. Changeset: ddb86469 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/ddb86469e024147ab41db7dd26344ba9e14ce17a Stats: 114 lines in 6 files changed: 64 ins; 34 del; 16 mod 8306123: Move InstanceKlass writeable flags Reviewed-by: iklam, fparain ------------- PR: https://git.openjdk.org/jdk/pull/13515 From coleenp at openjdk.org Wed Apr 19 14:10:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 14:10:12 GMT Subject: Integrated: 8306310: Move is_shared Klass flag In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 17:06:03 GMT, Coleen Phillimore wrote: > Please review this simple patch to move the is_shared_class flag to Klass, CDS flags. The eventual goal is to have AccessFlags only be ones defined in the classfile. > Tested with tier1-4. This pull request has now been integrated. Changeset: 1a41e12c Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/1a41e12c22168c6c50c6bc193ae249a4a390173c Stats: 15 lines in 2 files changed: 10 ins; 5 del; 0 mod 8306310: Move is_shared Klass flag Reviewed-by: iklam, fparain ------------- PR: https://git.openjdk.org/jdk/pull/13514 From duke at openjdk.org Wed Apr 19 14:49:49 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 19 Apr 2023 14:49:49 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: References: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> Message-ID: On Wed, 19 Apr 2023 11:00:32 GMT, Kim Barrett wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8305590: Remove nothrow exception specifications from operator new > > src/hotspot/share/prims/jvmtiRawMonitor.hpp line 114: > >> 112: >> 113: // Non-aborting operator new >> 114: void* operator new(size_t size, const std::nothrow_t& nothrow_constant) throw() { > > Hm, now I'm wondering why isn't an `operator delete` to go with this? Or are these objects > never deleted? Otherwise I'd have thought we'd get the same mismatched new/delete warning > you encountered elsewhere. If they're never supposed to be deleted, then giving `operator delete` > a deleted definition here seems appropriate, to prevent accidentally calling the CHeapObj function. This `operator new` just calls the `CHeapObj::operator new` with nothrow argument. So changing the caller will call the right one in `CHeapObj`. This object is deleted in https://github.com/openjdk/jdk/blob/c738c8ea3e9fda87abb03acb599a2433a344db09/src/hotspot/share/prims/jvmtiEnv.cpp#L3699 and this will call the `CHeapObj::operator delete` which is the right one. So this `operator new` is not needed since I changed the caller. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1171457189 From coleenp at openjdk.org Wed Apr 19 15:53:54 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 15:53:54 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock [v2] In-Reply-To: References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> Message-ID: <9adgTpkD3w5UQxjSDBuj2_Tuo4wj37BM6xwoogsd6l8=.31809977-8032-4220-afe7-bc7ac3d986e5@github.com> On Wed, 19 Apr 2023 06:34:57 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Update method deallocate_contents comments. > > src/hotspot/share/oops/method.cpp line 141: > >> 139: // for method handle intrinsics. >> 140: if (code() != nullptr) { >> 141: ((nmethod*)_code)->flush(); > > `flush` acquires the CodeCache_lock - could that be an issue in this call path? The call path to this is during a safepoint for cleaning up the deallocate lists, and the CodeCache_lock is a pretty low level no-safepoint lock, so the lock is fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13308#discussion_r1171542403 From shade at openjdk.org Wed Apr 19 16:02:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 16:02:08 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v5] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Windows fixes: align(...) is only for power-of-two alignments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13225/files - new: https://git.openjdk.org/jdk/pull/13225/files/f78aef54..29c7df36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=03-04 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From dnsimon at openjdk.org Wed Apr 19 16:05:08 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 19 Apr 2023 16:05:08 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v9] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 07:27:47 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - added breadcrumb in AnnotationParser about considering JVMCI should new annotation element types be added > - fixed javadoc comment Thanks for the reviews @turbanoff , @vnkozlov and @jddarcy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1514985140 From dnsimon at openjdk.org Wed Apr 19 16:05:13 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 19 Apr 2023 16:05:13 GMT Subject: Integrated: 8303431: [JVMCI] libgraal annotation API In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 18:07:34 GMT, Doug Simon wrote: > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. This pull request has now been integrated. Changeset: 48fd4f2b Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/48fd4f2bd37562a159e4089b15aa108e0b1bebeb Stats: 2321 lines in 34 files changed: 2270 ins; 23 del; 28 mod 8303431: [JVMCI] libgraal annotation API Reviewed-by: kvn, never, darcy ------------- PR: https://git.openjdk.org/jdk/pull/12810 From duke at openjdk.org Wed Apr 19 16:34:50 2023 From: duke at openjdk.org (ExE Boss) Date: Wed, 19 Apr 2023 16:34:50 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v5] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 19 Apr 2023 16:02:08 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Windows fixes: align(...) is only for power-of-two alignments src/java.base/share/classes/java/lang/Thread.java line 516: > 514: } > 515: > 516: private static native void sleep0(long nanos) throws InterruptedException; Maybe?name?this `sleepNanos0`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171590875 From kvn at openjdk.org Wed Apr 19 17:16:50 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Apr 2023 17:16:50 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v9] In-Reply-To: <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> Message-ID: On Mon, 17 Apr 2023 16:17:30 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests. Remember previous reducible Phis. Submitted new testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1515089566 From shade at openjdk.org Wed Apr 19 18:40:50 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 18:40:50 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v5] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 19 Apr 2023 16:32:05 GMT, ExE Boss wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Windows fixes: align(...) is only for power-of-two alignments > > src/java.base/share/classes/java/lang/Thread.java line 516: > >> 514: } >> 515: >> 516: private static native void sleep0(long nanos) throws InterruptedException; > > Maybe?name?this `sleepNanos0`? No, I don't think so. The name of this function is not relevant, as nothing else is calling it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171725698 From shade at openjdk.org Wed Apr 19 18:46:49 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 18:46:49 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v6] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: <-sDYWBguIUjiok6GP3WWaDkIKNLCCg6PcOkGkApY1lc=.87611b1b-f317-4eb8-b030-f36d8c93069c@github.com> > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Windows again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13225/files - new: https://git.openjdk.org/jdk/pull/13225/files/29c7df36..0e05a2f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From alanb at openjdk.org Wed Apr 19 18:53:48 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 19 Apr 2023 18:53:48 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v5] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: <_d0byYtDkJ2aDfgwQjWMSas2GtW359jasdHJOvAkBg4=.87bef233-43c6-4bf1-8986-80761cd333f1@github.com> On Wed, 19 Apr 2023 16:02:08 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Windows fixes: align(...) is only for power-of-two alignments test/jdk/java/lang/Thread/SleepSanity.java line 48: > 46: > 47: for (final int millis : TRY_MILLIS) { > 48: testTimes(() -> Thread.sleep(millis), millis, 10_000); I wonder if 10s is enough a tolerance. JDK-8303633 has some sightings of sleep(1000) taking 5.5s. We've since changed all of these to 20s. It might be that 10s is okay, it's just that we seem to run on some test systems (Windows mostly) where we need very high tolerance for tests like this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171736726 From coleenp at openjdk.org Wed Apr 19 19:33:43 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 19:33:43 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock [v3] In-Reply-To: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> Message-ID: <83vphN_tVH1YDMrlNwvDTBPypjcPhH0YTEM5DBXP7Eg=.8fcfb9ef-29d8-4b48-8e76-cb3796d9545b@github.com> > This patch releases the InvokeMethodTable_lock while creating a method handle intrinsic. If there's a race, it frees a Method created by racing thread. The logic is simple but uses the deallocate_list infrastructure that's mostly used for redefinition making it less rare. With Dacapo2009, this adds about 20 Methods + constant pools to the list. Also the method has to call nmethod->flush which is assumed to be something only GC calls. > > Tested with tier1-4. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into invoketable - Update method deallocate_contents comments. - 8305252: make_method_handle_intrinsic may call java code under a lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13308/files - new: https://git.openjdk.org/jdk/pull/13308/files/7cb29b91..023b56a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13308&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13308&range=01-02 Stats: 4810 lines in 112 files changed: 4275 ins; 284 del; 251 mod Patch: https://git.openjdk.org/jdk/pull/13308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13308/head:pull/13308 PR: https://git.openjdk.org/jdk/pull/13308 From shade at openjdk.org Wed Apr 19 19:55:38 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 19:55:38 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v7] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Adjust test times ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13225/files - new: https://git.openjdk.org/jdk/pull/13225/files/0e05a2f8..8617b5ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From shade at openjdk.org Wed Apr 19 19:55:42 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Apr 2023 19:55:42 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v5] In-Reply-To: <_d0byYtDkJ2aDfgwQjWMSas2GtW359jasdHJOvAkBg4=.87bef233-43c6-4bf1-8986-80761cd333f1@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <_d0byYtDkJ2aDfgwQjWMSas2GtW359jasdHJOvAkBg4=.87bef233-43c6-4bf1-8986-80761cd333f1@github.com> Message-ID: On Wed, 19 Apr 2023 18:50:27 GMT, Alan Bateman wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Windows fixes: align(...) is only for power-of-two alignments > > test/jdk/java/lang/Thread/SleepSanity.java line 48: > >> 46: >> 47: for (final int millis : TRY_MILLIS) { >> 48: testTimes(() -> Thread.sleep(millis), millis, 10_000); > > I wonder if 10s is enough a tolerance. JDK-8303633 has some sightings of sleep(1000) taking 5.5s. We've since changed all of these to 20s. It might be that 10s is okay, it's just that we seem to run on some test systems (Windows mostly) where we need very high tolerance for tests like this. Yeah, I was wondering about the same when writing the test. I have not seen it failing yet, and it was pretty slow without the parallelization. To handle the ~5s hiccups, I bumped all "times" tests from 10s to 20s. I kept "timeout" tests at the same 5s timeout, because they normally do wait for those 5s. The accidental hiccup would make timeout test accidentally pass, which I think is a fair trade for keeping the test execution times low. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1171789353 From sspitsyn at openjdk.org Wed Apr 19 22:02:20 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 19 Apr 2023 22:02:20 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v3] In-Reply-To: References: Message-ID: > This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. > > Testing: mach5 tiers 1-6 were successful. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge - 8304444: Reappearance of NULL in jvmtiThreadState.cpp - 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13484/files - new: https://git.openjdk.org/jdk/pull/13484/files/7735ffac..5594635c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=01-02 Stats: 224828 lines in 1318 files changed: 212269 ins; 4015 del; 8544 mod Patch: https://git.openjdk.org/jdk/pull/13484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13484/head:pull/13484 PR: https://git.openjdk.org/jdk/pull/13484 From coleenp at openjdk.org Wed Apr 19 22:53:41 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Apr 2023 22:53:41 GMT Subject: RFR: 8306474: Move InstanceKlass read-only flags Message-ID: Moved three read-only InstanceKlass flags out of AccessFlags to InstanceKlassFlags, and removed unused and unneeded SA code. Tested with tier1-4. ------------- Commit messages: - 8306474: Move InstanceKlass read-only flags Changes: https://git.openjdk.org/jdk/pull/13545/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13545&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306474 Stats: 44 lines in 10 files changed: 11 ins; 26 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13545.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13545/head:pull/13545 PR: https://git.openjdk.org/jdk/pull/13545 From sspitsyn at openjdk.org Wed Apr 19 23:17:31 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 19 Apr 2023 23:17:31 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread Message-ID: This enhancement adds support of virtual threads to the JVMTI `StopThread` function. In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: > The thread is a suspended virtual thread and the implementation > was unable to throw an asynchronous exception from this frame. A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 Testing: The mach5 tears 1-6 are in progress. Preliminary test runs were good in general. The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. Also, two JCK JVMTI tests are failing in the tier-6 : > vm/jvmti/StopThread/stop001/stop00103/stop00103.html > vm/jvmti/StopThread/stop001/stop00103/stop00103a.html These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. ------------- Commit messages: - fixed trailing spaces - 8306034: add support of virtual threads to JVMTI StopThread Changes: https://git.openjdk.org/jdk/pull/13546/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306034 Stats: 477 lines in 8 files changed: 456 ins; 8 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From jrose at openjdk.org Wed Apr 19 23:31:42 2023 From: jrose at openjdk.org (John R Rose) Date: Wed, 19 Apr 2023 23:31:42 GMT Subject: RFR: 8306474: Move InstanceKlass read-only flags In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 22:46:55 GMT, Coleen Phillimore wrote: > Moved three read-only InstanceKlass flags out of AccessFlags to InstanceKlassFlags, and removed unused and unneeded SA code. > Tested with tier1-4. Reviewed. I like to see access flags being slowly emptied out. It was a not-so-good idea (in hindsight) to overload them. src/hotspot/share/oops/instanceKlassFlags.hpp line 52: > 50: flag(is_shared_app_class , 1 << 9) /* defining class loader is app class loader */ \ > 51: flag(has_contended_annotations , 1 << 10) /* has @Contended annotation */ \ > 52: flag(has_localvariable_table , 1 << 11) /* has localvariable information */ I suggest, as a matter of course, keeping the final `` at the end of the list, and then adding something like `/*end of list*/` at the end. This helps prevent churn on the final line, in such lists. ------------- Marked as reviewed by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13545#pullrequestreview-1393009587 PR Review Comment: https://git.openjdk.org/jdk/pull/13545#discussion_r1171940561 From jrose at openjdk.org Wed Apr 19 23:39:44 2023 From: jrose at openjdk.org (John R Rose) Date: Wed, 19 Apr 2023 23:39:44 GMT Subject: RFR: 8306474: Move InstanceKlass read-only flags In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 23:27:10 GMT, John R Rose wrote: >> Moved three read-only InstanceKlass flags out of AccessFlags to InstanceKlassFlags, and removed unused and unneeded SA code. >> Tested with tier1-4. > > src/hotspot/share/oops/instanceKlassFlags.hpp line 52: > >> 50: flag(is_shared_app_class , 1 << 9) /* defining class loader is app class loader */ \ >> 51: flag(has_contended_annotations , 1 << 10) /* has @Contended annotation */ \ >> 52: flag(has_localvariable_table , 1 << 11) /* has localvariable information */ > > I suggest, as a matter of course, keeping the final `` at the end of the list, and then adding something like `/*end of list*/` at the end. This helps prevent churn on the final line, in such lists. That should be ``, in case it didn?t come through markdown. Removing the final backslash, like removing the final comma of an `enum`, causes churn. Adding a final comment also makes it easy to visually locate the end of the sequence; it is much clearer than the blank line required by the cpp macro mechanism. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13545#discussion_r1171944831 From cjplummer at openjdk.org Wed Apr 19 23:54:43 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 19 Apr 2023 23:54:43 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 22:54:35 GMT, Serguei Spitsyn wrote: > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. I'll be fixing this using [JDK-8306467](https://bugs.openjdk.org/browse/JDK-8306467), which will be done after the JDWP/JDI spec/impl update, which is being handled by [JDK-8306471](https://bugs.openjdk.org/browse/JDK-8306471). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13546#issuecomment-1515519931 From sspitsyn at openjdk.org Thu Apr 20 00:00:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 20 Apr 2023 00:00:41 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 23:51:53 GMT, Chris Plummer wrote: > I'll be fixing this using [JDK-8306467](https://bugs.openjdk.org/browse/JDK-8306467), > which will be done after the JDWP/JDI spec/impl update, which is being handled by > [JDK-8306471](https://bugs.openjdk.org/browse/JDK-8306471). Thank you, Chris. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13546#issuecomment-1515523290 From coleenp at openjdk.org Thu Apr 20 00:27:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 00:27:20 GMT Subject: RFR: 8306474: Move InstanceKlass read-only flags [v2] In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 23:36:45 GMT, John R Rose wrote: >> src/hotspot/share/oops/instanceKlassFlags.hpp line 52: >> >>> 50: flag(is_shared_app_class , 1 << 9) /* defining class loader is app class loader */ \ >>> 51: flag(has_contended_annotations , 1 << 10) /* has @Contended annotation */ \ >>> 52: flag(has_localvariable_table , 1 << 11) /* has localvariable information */ >> >> I suggest, as a matter of course, keeping the final `` at the end of the list, and then adding something like `/*end of list*/` at the end. This helps prevent churn on the final line, in such lists. > > That should be ``, in case it didn?t come through markdown. Removing the final backslash, like removing the final comma of an `enum`, causes churn. Adding a final comment also makes it easy to visually locate the end of the sequence; it is much clearer than the blank line required by the cpp macro mechanism. Thanks John, that was a good suggestion. Yes, I'm clearing out the AccessFlags in lots of small changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13545#discussion_r1171963968 From coleenp at openjdk.org Thu Apr 20 00:27:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 00:27:20 GMT Subject: RFR: 8306474: Move InstanceKlass read-only flags [v2] In-Reply-To: References: Message-ID: > Moved three read-only InstanceKlass flags out of AccessFlags to InstanceKlassFlags, and removed unused and unneeded SA code. > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: John's suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13545/files - new: https://git.openjdk.org/jdk/pull/13545/files/eb1e4139..066798c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13545&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13545&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13545.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13545/head:pull/13545 PR: https://git.openjdk.org/jdk/pull/13545 From kvn at openjdk.org Thu Apr 20 00:37:49 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Apr 2023 00:37:49 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v9] In-Reply-To: <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> Message-ID: On Mon, 17 Apr 2023 16:17:30 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests. Remember previous reducible Phis. Again got failures in the test on Aarch64 running with -XX:-UseTLAB: testCmpMergeWithNull(boolean,int,int): - Failed comparison: [found] 0 = 2 [given] testCmpMergeWithNull_Second(boolean,int,int) - Failed comparison: [found] 0 = 1 [given] testMergedAccessAfterCallNoWrite(boolean,int,int) - Failed comparison: [found] 2 = 3 [given] testMergedAccessAfterCallWithWrite(boolean,int,int) - Failed comparison: [found] 2 = 3 [given] testNestedObjectsArray(boolean,int,int) - Failed comparison: [found] 2 = 4 [given] ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1515550553 From coleenp at openjdk.org Thu Apr 20 00:41:50 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 00:41:50 GMT Subject: RFR: 8306482: Remove unused Method AccessFlags Message-ID: Please review this small change to remove Method AccessFlags that are unused. These flags were moved to ConstMethod a long time ago. Tested with tier1-4, SA tests locally ------------- Commit messages: - 8306482: Remove unused Method AccessFlags Changes: https://git.openjdk.org/jdk/pull/13549/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13549&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306482 Stats: 27 lines in 4 files changed: 0 ins; 27 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13549.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13549/head:pull/13549 PR: https://git.openjdk.org/jdk/pull/13549 From kvn at openjdk.org Thu Apr 20 00:50:59 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Apr 2023 00:50:59 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v9] In-Reply-To: <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> Message-ID: <09b2gzJOWHojxvBpg79PfgQgD0qh56CqHJk484zJX-8=.f1df20ad-c202-4a20-a98b-c334e808eaae@github.com> On Mon, 17 Apr 2023 16:17:30 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests. Remember previous reducible Phis. Also next 2 JVMCI tests failed: compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaType.java # Internal Error (/workspace/open/src/hotspot/cpu/x86/macroAssembler_x86.cpp:829), pid=2430194, tid=2430218 # fatal error: DEBUG MESSAGE: exact klass and actual klass differ Could be due to [12810](https://git.openjdk.org/jdk/pull/12810) ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1515556571 From kvn at openjdk.org Thu Apr 20 01:07:14 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Apr 2023 01:07:14 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v9] In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 16:00:18 GMT, Doug Simon wrote: >> Doug Simon has updated the pull request incrementally with two additional commits since the last revision: >> >> - added breadcrumb in AnnotationParser about considering JVMCI should new annotation element types be added >> - fixed javadoc comment > > Thanks for the reviews @turbanoff , @vnkozlov and @jddarcy. @dougxc I see next 2 JVMCI tests failed when run with `-XX:TypeProfileLevel=222` on all platforms (our stress testing): compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaType.java # Internal Error (/workspace/open/src/hotspot/cpu/x86/macroAssembler_x86.cpp:829), pid=2430194, tid=2430218 # fatal error: DEBUG MESSAGE: exact klass and actual klass differ Current thread (0x00007f9ff8460480): JavaThread "MainThread" [_thread_in_Java, id=2430218, stack(0x00007f9fe06b7000,0x00007f9fe07b8000)] Stack: [0x00007f9fe06b7000,0x00007f9fe07b8000], sp=0x00007f9fe07b44e0, free space=1013k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x12b8d35] MacroAssembler::debug64(char*, long, long*)+0x45 (macroAssembler_x86.cpp:829) J 1935 c1 sun.reflect.annotation.AnnotationParser.parseAnnotation2(Ljava/nio/ByteBuffer;Ljdk/internal/reflect/ConstantPool;Ljava/lang/Class;Z[Ljava/lang/Class;)Ljava/lang/annotation/Annotation; java.base at 21-internal (275 bytes) @ 0x00007f9fe10d4cc6 [0x00007f9fe10d4740+0x0000000000000586] [error occurred during error reporting (printing native stack (with source info)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1515565983 From dholmes at openjdk.org Thu Apr 20 01:49:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 01:49:44 GMT Subject: RFR: 8306474: Move InstanceKlass read-only flags [v2] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 00:27:20 GMT, Coleen Phillimore wrote: >> Moved three read-only InstanceKlass flags out of AccessFlags to InstanceKlassFlags, and removed unused and unneeded SA code. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > John's suggestion Changes seem fine. Thanks. Why was `has_finalizer` not included with this group? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13545#pullrequestreview-1393091073 From dholmes at openjdk.org Thu Apr 20 02:03:42 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 02:03:42 GMT Subject: RFR: 8306482: Remove unused Method AccessFlags In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 00:35:06 GMT, Coleen Phillimore wrote: > Please review this small change to remove Method AccessFlags that are unused. These flags were moved to ConstMethod a long time ago. > Tested with tier1-4, SA tests locally Seems okay. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13549#pullrequestreview-1393100023 From dholmes at openjdk.org Thu Apr 20 02:36:51 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 02:36:51 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v7] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 19 Apr 2023 19:55:38 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Adjust test times Looking much cleaner/simpler. A couple more comments below. Thanks. src/hotspot/os/windows/os_windows.cpp line 5263: > 5261: if (nanos_left > 0) { > 5262: millis++; > 5263: } You could simplify this to: if (nanos > millis * NANOSECS_PER_MILLISEC) { millis++; } src/hotspot/os/windows/park_windows.hpp line 53: > 51: void park () ; > 52: void unpark () ; > 53: int park(jlong millis); While you are here could you get rid of the spaces on park/unpark before the parentheses - thanks. src/hotspot/share/runtime/javaThread.cpp line 1983: > 1981: } > 1982: > 1983: bool JavaThread::sleep(jlong millis) { Perhaps a comment: // Internal convenience function for millisecond resolution sleeps. src/java.base/share/classes/java/lang/Thread.java line 1: > 1: /* Why doesn't `Thread.sleep(long millis)` simply call `Thread.sleep(millis, 0)` instead of duplicating most of the code? ------------- PR Review: https://git.openjdk.org/jdk/pull/13225#pullrequestreview-1393107848 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172011960 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172013223 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172014393 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172017034 From dholmes at openjdk.org Thu Apr 20 03:04:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 03:04:44 GMT Subject: RFR: 8257967: JFR: Event for loaded agents In-Reply-To: References: Message-ID: On Tue, 7 Feb 2023 18:43:32 GMT, Erik Gahlin wrote: > Could I have a review of an event for native and Java agents. > > Testing: > - tier1 > - tier2 > - jdk/jdk/jfr/* > - 100 * TestLoadedAgent.java > > Rationale for event fields: > - name: to identify problematic third party agents > - options: to identify problematic options, such as too generic filters or conflicting port numbers, that could impact application behavior > - dynamic: if an agent was loaded by a user using jcmd, which could explain why problem only occur on some server instances > - java: if the agent is native, it could explain crashes > - loadTime: to understand if application problem correlates with the time the agent was loaded > > Alternatives: > - I considered making a non-periodic event that is emitted when the agent is loaded, but agents loaded at startup are then unlikely to make it into the recording. > - If it is a JPLIS agent, the jar name is used instead of "instrument". This is likely what users expect when using VirtualMachine::loadAgent(name, options) API or -javaagent:name=options > - I considered making all accesses to the agentLibrary list protected by a mutex, but it could potentially lead to deadlocks when non-JFR code iterates over the list. I think this is better fixed as a separate issue, if deemed necessary. So far so good. JFR iterates the list at every chunk rotation and not from the attach thread, so the risk for an unsafe access is real and synchronization is needed. > - When a JPLIS agent was loaded by attach, it wasn't detected properly, so I added a name check so I could pass true to the constructor and make TestLoadedAgent pass. It would possible to make all detection of JPLIS using the name "instrument" and not rely on a boolean in the constructor, but I deemed it outside the scope for the enhancement. > - I considered using Ticks::now() instead of os::javaTimeMillis() as time source, but TSC is not initialized this early. It could perhaps be fixed by moving the initialization earlier, but it might have other side effects, and is better done outside this enhancement. > > > Thanks > Erik @egahlin this PR targets JDK-8257967, but that was already fixed by https://github.com/openjdk/jdk/pull/12923. I think that PR used the wrong bug id! @mgronlun ------------- PR Comment: https://git.openjdk.org/jdk/pull/12460#issuecomment-1515643848 From lmesnik at openjdk.org Thu Apr 20 03:09:44 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 20 Apr 2023 03:09:44 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 22:54:35 GMT, Serguei Spitsyn wrote: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 38: > 36: * > 37: * @requires vm.continuations > 38: * @compile --enable-preview -source ${jdk.version} StopThreadTest.java --enable-preview is not needed anymore. test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 48: > 46: static final int JVMTI_ERROR_NONE = 0; > 47: static final int THREAD_NOT_SUSPENDED = 13; > 48: static final int PASSED = 0; I think it would be better to don't use statuses and just throw exception after first failure. Usually the other results of other test cases might be corrupted and output just confuse user. test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/libStopThreadTest.cpp line 135: > 133: check_jvmti_status(jni, err, "prepareAgent: Failed in JVMTI SetBreakpoint"); > 134: > 135: err = jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_BREAKPOINT, NULL); We have set_event_notification_mode() in jvmti_h. test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/libStopThreadTest.cpp line 151: > 149: Java_StopThreadTest_resumeThread(JNIEnv *jni, jclass cls, jthread thread) { > 150: LOG("Main: resumeThread\n"); > 151: jvmtiError err = jvmti->ResumeThread(thread); there is suspend_thread/resume_thread() in jvmti.h ------------- PR Review: https://git.openjdk.org/jdk/pull/13546#pullrequestreview-1393079770 PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1171991173 PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1172026145 PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1171995417 PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1171994042 From dholmes at openjdk.org Thu Apr 20 03:10:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 03:10:55 GMT Subject: RFR: 8306282: Build failure linux-arm32-open-cmp-baseline after JDK-8257967 [v3] In-Reply-To: References: <7Mtu4sDUSEqgO7yYhYPVT-aU0XmHOObSRnPOQqDcIic=.443775c9-0d30-4c6f-9815-0133f57a2e00@github.com> <8bWAmer1c40K83zNOqk4t96rJ8xPG-lVyjBXpeGTp2M=.a4fead0c-23c2-4575-b5f7-5782c874fa5d@github.com> Message-ID: On Wed, 19 Apr 2023 11:09:02 GMT, Markus Gr?nlund wrote: > It follows the same pattern as for other jvmti*.cpp files also excluded via make/hotspot/lib/jvmFeatures.gmk. For example, jvmtiExport.cpp. Okay , yes I see that now - I was thrown by the fact that the original versions of these functions in arguments.hpp/cpp did not need this but they were in fact not conditional despite being JVMTI specific. Seems to me there should be a bunch of code in arguments.cpp that is conditionalized on INCLUDE_JVMTI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13512#issuecomment-1515648566 From kbarrett at openjdk.org Thu Apr 20 05:07:42 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 20 Apr 2023 05:07:42 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: References: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> Message-ID: <4mSYzcXKe-qzoXuUBeKHw6nzxEewqPXPkfeZ1kTExHQ=.adb1f609-ffad-43c7-9067-c9fdbeea4071@github.com> On Wed, 19 Apr 2023 14:46:48 GMT, Afshin Zafari wrote: >> src/hotspot/share/prims/jvmtiRawMonitor.hpp line 114: >> >>> 112: >>> 113: // Non-aborting operator new >>> 114: void* operator new(size_t size, const std::nothrow_t& nothrow_constant) throw() { >> >> Hm, now I'm wondering why isn't an `operator delete` to go with this? Or are these objects >> never deleted? Otherwise I'd have thought we'd get the same mismatched new/delete warning >> you encountered elsewhere. If they're never supposed to be deleted, then giving `operator delete` >> a deleted definition here seems appropriate, to prevent accidentally calling the CHeapObj function. > > This `operator new` just calls the `CHeapObj::operator new` with nothrow argument. So changing the caller will call the right one in `CHeapObj`. This object is deleted in > https://github.com/openjdk/jdk/blob/c738c8ea3e9fda87abb03acb599a2433a344db09/src/hotspot/share/prims/jvmtiEnv.cpp#L3699 > and this will call the `CHeapObj::operator delete` which is the right one. So this `operator new` is not needed since I changed the caller. A possible reason for keeping this `operator new` is to force the use of null return for oom for this class. If it's removed then we have the option of (perhaps unintentionally) using the terminating allocator. That doesn't seem like a _strong_ reason to keep it, but someone more familiar with jvmti stuff might want to weigh in. If it is kept, then I think it should have a corresponding `operator delete`, else it at least looks odd. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1172085220 From dholmes at openjdk.org Thu Apr 20 07:00:42 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 07:00:42 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: <4mSYzcXKe-qzoXuUBeKHw6nzxEewqPXPkfeZ1kTExHQ=.adb1f609-ffad-43c7-9067-c9fdbeea4071@github.com> References: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> <4mSYzcXKe-qzoXuUBeKHw6nzxEewqPXPkfeZ1kTExHQ=.adb1f609-ffad-43c7-9067-c9fdbeea4071@github.com> Message-ID: On Thu, 20 Apr 2023 05:05:17 GMT, Kim Barrett wrote: >> This `operator new` just calls the `CHeapObj::operator new` with nothrow argument. So changing the caller will call the right one in `CHeapObj`. This object is deleted in >> https://github.com/openjdk/jdk/blob/c738c8ea3e9fda87abb03acb599a2433a344db09/src/hotspot/share/prims/jvmtiEnv.cpp#L3699 >> and this will call the `CHeapObj::operator delete` which is the right one. So this `operator new` is not needed since I changed the caller. > > A possible reason for keeping this `operator new` is to force the use of null return for oom for this class. > If it's removed then we have the option of (perhaps unintentionally) using the terminating allocator. > That doesn't seem like a _strong_ reason to keep it, but someone more familiar with jvmti stuff might > want to weigh in. If it is kept, then I think it should have a corresponding `operator delete`, else it at > least looks odd. JVMTI does not abort on OOM it reports an error, so we definitely do not want a terminating allocator! jvmtiError JvmtiEnv::CreateRawMonitor(const char* name, jrawMonitorID* monitor_ptr) { JvmtiRawMonitor* rmonitor = new JvmtiRawMonitor(name); NULL_CHECK(rmonitor, JVMTI_ERROR_OUT_OF_MEMORY); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1172159575 From dholmes at openjdk.org Thu Apr 20 07:03:44 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 07:03:44 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock [v3] In-Reply-To: <83vphN_tVH1YDMrlNwvDTBPypjcPhH0YTEM5DBXP7Eg=.8fcfb9ef-29d8-4b48-8e76-cb3796d9545b@github.com> References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> <83vphN_tVH1YDMrlNwvDTBPypjcPhH0YTEM5DBXP7Eg=.8fcfb9ef-29d8-4b48-8e76-cb3796d9545b@github.com> Message-ID: On Wed, 19 Apr 2023 19:33:43 GMT, Coleen Phillimore wrote: >> This patch releases the InvokeMethodTable_lock while creating a method handle intrinsic. If there's a race, it frees a Method created by racing thread. The logic is simple but uses the deallocate_list infrastructure that's mostly used for redefinition making it less rare. With Dacapo2009, this adds about 20 Methods + constant pools to the list. Also the method has to call nmethod->flush which is assumed to be something only GC calls. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into invoketable > - Update method deallocate_contents comments. > - 8305252: make_method_handle_intrinsic may call java code under a lock Thanks for the updated comments and answering my queries. Looks fine. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13308#pullrequestreview-1393326566 From dholmes at openjdk.org Thu Apr 20 07:20:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 07:20:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 14:38:20 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> RISCV update > > src/hotspot/share/services/management.cpp line 1131: > >> 1129: if (maxDepth == 0) { >> 1130: // No stack trace to dump so we do not need to stop the world. >> 1131: // Since we never do the VM op here we must set the threads list. > > You may want to add a little more to this comment after L1131: > > // Since we are not stopping the world, the data we gather here > // may change the moment after we return it. Nit: even if you are stopping the world you have restarted it by the time you return so the data can change anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1172177158 From mgronlun at openjdk.org Thu Apr 20 08:16:48 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 20 Apr 2023 08:16:48 GMT Subject: RFR: 8257967: JFR: Event for loaded agents In-Reply-To: References: Message-ID: <6VZMqTiDFLcw7I6BBanuGJvb8IAGG_BSaXA6NiGSwGs=.f99e8fa1-e4d4-4833-89ee-78c503f6a504@github.com> On Thu, 20 Apr 2023 03:01:57 GMT, David Holmes wrote: > @egahlin this PR targets JDK-8257967, but that was already fixed by https://github.com/openjdk/jdk/pull/12923. I think that PR used the wrong bug id! @mgronlun > > Yes, I took over the initial work explored by Erik. This PR should be closed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12460#issuecomment-1515912369 From shade at openjdk.org Thu Apr 20 08:18:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Apr 2023 08:18:08 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v7] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Thu, 20 Apr 2023 02:14:53 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust test times > > src/hotspot/os/windows/os_windows.cpp line 5263: > >> 5261: if (nanos_left > 0) { >> 5262: millis++; >> 5263: } > > You could simplify this to: > > if (nanos > millis * NANOSECS_PER_MILLISEC) { > millis++; > } True, simplified. > src/hotspot/os/windows/park_windows.hpp line 53: > >> 51: void park () ; >> 52: void unpark () ; >> 53: int park(jlong millis); > > While you are here could you get rid of the spaces on park/unpark before the parentheses - thanks. Right, did so. > src/hotspot/share/runtime/javaThread.cpp line 1983: > >> 1981: } >> 1982: >> 1983: bool JavaThread::sleep(jlong millis) { > > Perhaps a comment: > > // Internal convenience function for millisecond resolution sleeps. Added! > src/java.base/share/classes/java/lang/Thread.java line 1: > >> 1: /* > > Why doesn't `Thread.sleep(long millis)` simply call `Thread.sleep(millis, 0)` instead of duplicating most of the code? That would change the stack depth, for which some tests are sensitive. https://github.com/openjdk/jdk/pull/13225#issuecomment-1514461529. I think merging `sleep` implementations is something to do in a targeted cleanup, which would handle those test updates too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172237713 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172238273 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172237000 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172236374 From shade at openjdk.org Thu Apr 20 08:24:10 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Apr 2023 08:24:10 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v8] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13225/files - new: https://git.openjdk.org/jdk/pull/13225/files/8617b5ce..6a2bcf0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=06-07 Stats: 5 lines in 3 files changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From duke at openjdk.org Thu Apr 20 08:42:01 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 20 Apr 2023 08:42:01 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: References: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> Message-ID: <-t-PXOEJs-9wbK-IZVrthyrtUTLfOhI5C3l2IgfmQvU=.cc4287db-ffbe-4c54-94b8-42ce57db87bd@github.com> On Wed, 19 Apr 2023 11:49:49 GMT, David Holmes wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8305590: Remove nothrow exception specifications from operator new > > src/hotspot/share/jfr/utilities/jfrAllocation.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2014, 2023, Oracle and/or its affiliates. All rights reserved. > > This appears to be the only change to this file so should be reverted. Thanks for catching this. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1172267019 From duke at openjdk.org Thu Apr 20 08:41:58 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 20 Apr 2023 08:41:58 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v3] In-Reply-To: References: Message-ID: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> > - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. > > - The `-fcheck-new` is removed from the gcc compile flags. > > - The `operator new` and `operator delete` are deleted from `StackObj`. > > - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. > - The `Thread::operator new`with and without `null` return are removed. > > ### Tests > local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8305590: Remove nothrow exception specifications from operator new ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13498/files - new: https://git.openjdk.org/jdk/pull/13498/files/d2d75e7f..bc0d3069 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13498&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13498&range=01-02 Stats: 7 lines in 3 files changed: 0 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13498/head:pull/13498 PR: https://git.openjdk.org/jdk/pull/13498 From duke at openjdk.org Thu Apr 20 08:42:03 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 20 Apr 2023 08:42:03 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v3] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 18:25:55 GMT, Kim Barrett wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8305590: Remove nothrow exception specifications from operator new > > src/hotspot/share/jfr/utilities/jfrAllocation.hpp line 58: > >> 56: NOINLINE void* operator new(size_t size); >> 57: NOINLINE void* operator new (size_t size, const std::nothrow_t& nothrow_constant) throw(); >> 58: NOINLINE void* operator new [](size_t size); > > The changes to JfrCHeapObj are not correct, because these allocators currently _can_ return null. > Their implementation is just to return the result of calling the non-throwing allocator. That's probably > not an ideal implementation. Either the declaration needs to be left as-is or the implementation changed. declaration kept as it is. > src/hotspot/share/memory/allocation.hpp line 287: > >> 285: private: >> 286: void* operator new(size_t size) throw() = delete; >> 287: void* operator new [](size_t size) throw() = delete; > > The lingering nothrow exception-specs here are just clutter and can be removed. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1172265672 PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1172265938 From duke at openjdk.org Thu Apr 20 08:42:03 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 20 Apr 2023 08:42:03 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: References: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> <4mSYzcXKe-qzoXuUBeKHw6nzxEewqPXPkfeZ1kTExHQ=.adb1f609-ffad-43c7-9067-c9fdbeea4071@github.com> Message-ID: <4x0Wm39zfuJNcoQ5PCnW_7coiE5AqQRVMSYqKFW-KUc=.ffc385ac-9baf-4f6d-a65c-9443b495fcd5@github.com> On Thu, 20 Apr 2023 06:58:20 GMT, David Holmes wrote: >> A possible reason for keeping this `operator new` is to force the use of null return for oom for this class. >> If it's removed then we have the option of (perhaps unintentionally) using the terminating allocator. >> That doesn't seem like a _strong_ reason to keep it, but someone more familiar with jvmti stuff might >> want to weigh in. If it is kept, then I think it should have a corresponding `operator delete`, else it at >> least looks odd. > > JVMTI does not abort on OOM it reports an error, so we definitely do not want a terminating allocator! > > jvmtiError > JvmtiEnv::CreateRawMonitor(const char* name, jrawMonitorID* monitor_ptr) { > JvmtiRawMonitor* rmonitor = new JvmtiRawMonitor(name); > NULL_CHECK(rmonitor, JVMTI_ERROR_OUT_OF_MEMORY); The new operator is removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1172266527 From dnsimon at openjdk.org Thu Apr 20 09:12:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 20 Apr 2023 09:12:02 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v9] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 07:27:47 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - added breadcrumb in AnnotationParser about considering JVMCI should new annotation element types be added > - fixed javadoc comment I can reproduce this locally as well but really don't know where to start in terms of debugging this. Can you please provide hints as to what may cause this failure C1 compiled code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1515986311 From aboldtch at openjdk.org Thu Apr 20 09:17:42 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 20 Apr 2023 09:17:42 GMT Subject: RFR: 8305880: Loom: Avoid putting stale object pointers in oops In-Reply-To: References: Message-ID: <2S4qoyznxMR1XEY7BuHfqxgyu22Hb1NXcuiLabYtyaw=.4368f918-cf0d-47ef-bcde-b00f25066962@github.com> On Wed, 12 Apr 2023 06:25:17 GMT, Stefan Karlsson wrote: > Generational ZGC has extra verification code for oops, which trigger asserts when it finds stale oops. We have cleaned away some usages of stale oops in the upstream repository (openjdk/jdk), but there are still a couple left in the Loom code. I propose that we rewrite the code, to pave the way for Generational ZGC. > > I've tested this by running these patches on top of openjdk/fibers + ZGC. I've also tested this with Skynet + Generational ZGC, where these issues where first found. lgtm. ------------- Marked as reviewed by aboldtch (Committer). PR Review: https://git.openjdk.org/jdk/pull/13439#pullrequestreview-1393562433 From stefank at openjdk.org Thu Apr 20 09:17:43 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 20 Apr 2023 09:17:43 GMT Subject: RFR: 8305880: Loom: Avoid putting stale object pointers in oops In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 06:25:17 GMT, Stefan Karlsson wrote: > Generational ZGC has extra verification code for oops, which trigger asserts when it finds stale oops. We have cleaned away some usages of stale oops in the upstream repository (openjdk/jdk), but there are still a couple left in the Loom code. I propose that we rewrite the code, to pave the way for Generational ZGC. > > I've tested this by running these patches on top of openjdk/fibers + ZGC. I've also tested this with Skynet + Generational ZGC, where these issues where first found. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13439#issuecomment-1515995129 From stefank at openjdk.org Thu Apr 20 09:21:53 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 20 Apr 2023 09:21:53 GMT Subject: Integrated: 8305880: Loom: Avoid putting stale object pointers in oops In-Reply-To: References: Message-ID: <8e1EbVWAxXLWg0DYFLlUTWhfGvTkytYFcFVeZrk1mjU=.415464dc-12ec-4251-8106-1c75ac54af93@github.com> On Wed, 12 Apr 2023 06:25:17 GMT, Stefan Karlsson wrote: > Generational ZGC has extra verification code for oops, which trigger asserts when it finds stale oops. We have cleaned away some usages of stale oops in the upstream repository (openjdk/jdk), but there are still a couple left in the Loom code. I propose that we rewrite the code, to pave the way for Generational ZGC. > > I've tested this by running these patches on top of openjdk/fibers + ZGC. I've also tested this with Skynet + Generational ZGC, where these issues where first found. This pull request has now been integrated. Changeset: 6a7dff30 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/6a7dff30edce7a24400b27bee4d7ddd45eed523d Stats: 48 lines in 10 files changed: 9 ins; 2 del; 37 mod 8305880: Loom: Avoid putting stale object pointers in oops Reviewed-by: eosterlund, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/13439 From shade at openjdk.org Thu Apr 20 09:29:43 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Apr 2023 09:29:43 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v9] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Handle overflows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13225/files - new: https://git.openjdk.org/jdk/pull/13225/files/6a2bcf0b..a759556d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=07-08 Stats: 16 lines in 2 files changed: 14 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From shade at openjdk.org Thu Apr 20 09:35:48 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Apr 2023 09:35:48 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v9] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: <0gqzpwJ7mn92hLaHvQVq4YgrslTWWdICKYzpWo8BgOw=.abfafe78-363e-4de8-a79c-387b5de5f524@github.com> On Thu, 20 Apr 2023 09:29:43 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Handle overflows While writing the CSR, I realized we still need to keep the `MAX_SECS` cap at POSIX path for individual sleeps, if only to care for downstream Solaris ports that are still maintained. We also would like to care about accidentally calling `JavaThread::sleep(jlong millis)` with e.g. `max_jlong`. New commit should handle both. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1516018689 From kbarrett at openjdk.org Thu Apr 20 09:51:45 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 20 Apr 2023 09:51:45 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v3] In-Reply-To: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> References: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> Message-ID: On Thu, 20 Apr 2023 08:41:58 GMT, Afshin Zafari wrote: >> - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. >> >> - The `-fcheck-new` is removed from the gcc compile flags. >> >> - The `operator new` and `operator delete` are deleted from `StackObj`. >> >> - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. >> - The `Thread::operator new`with and without `null` return are removed. >> >> ### Tests >> local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8305590: Remove nothrow exception specifications from operator new Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13498#pullrequestreview-1393623647 From zcai at openjdk.org Thu Apr 20 10:19:44 2023 From: zcai at openjdk.org (Zixian Cai) Date: Thu, 20 Apr 2023 10:19:44 GMT Subject: RFR: JDK-8306538: Zero variant build failure after JDK-8257967 Message-ID: JDK-8306538: Zero variant build failure after JDK-8257967 ------------- Commit messages: - Define JvmtiAgent::set_os_lib when JVMTI feature is not available Changes: https://git.openjdk.org/jdk/pull/13557/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13557&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306538 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13557/head:pull/13557 PR: https://git.openjdk.org/jdk/pull/13557 From rkennke at openjdk.org Thu Apr 20 11:15:47 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 20 Apr 2023 11:15:47 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/c3486726..5d0a0451 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=61 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=60-61 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From shade at openjdk.org Thu Apr 20 12:12:52 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Apr 2023 12:12:52 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v9] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Thu, 20 Apr 2023 12:04:38 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Handle overflows > > src/hotspot/os/posix/os_posix.cpp line 1342: > >> 1340: } >> 1341: >> 1342: static jlong nanos_to_nanos_bounded(jlong nanos) { > > I don't think we actually need this for nanos. There is no overflow potential and the MAX_SECS limit will be handled later. Hm, "later" where? Because I see the call to `pthread_condwait_timed` right after we produced abstime using `nanos_to_nanos_bounded`, and this looks like the last opportunity to handle the `MAX_SECS` limit: https://github.com/openjdk/jdk/pull/13225/files#diff-7daa601d72ef74e5281faf8256b537b4ec9c5e5c236b716902592da12fa2aad2R1582 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172492842 From dholmes at openjdk.org Thu Apr 20 12:12:51 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 12:12:51 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v9] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Thu, 20 Apr 2023 09:29:43 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Handle overflows src/hotspot/os/posix/os_posix.cpp line 1342: > 1340: } > 1341: > 1342: static jlong nanos_to_nanos_bounded(jlong nanos) { I don't think we actually need this for nanos. There is no overflow potential and the MAX_SECS limit will be handled later. src/hotspot/share/runtime/javaThread.cpp line 1987: > 1985: jlong nanos; > 1986: if (millis > max_jlong / NANOUNITS_PER_MILLIUNIT) { > 1987: // Conversion to nanos would overflow, saturate at max Good catch! I had assumed `millis_to_nanos` handled this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172488236 PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172490688 From shade at openjdk.org Thu Apr 20 12:19:43 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Apr 2023 12:19:43 GMT Subject: RFR: JDK-8306538: Zero variant build failure after JDK-8257967 In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 10:10:57 GMT, Zixian Cai wrote: > This follows the same style of fix applied in #13512. I noticed this issue when cross-compiling zero slowdebug for riscv64. I confirmed locally that this PR fixes the linking errors. Looks fine and trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13557#pullrequestreview-1393854336 From dholmes at openjdk.org Thu Apr 20 12:22:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 12:22:45 GMT Subject: RFR: JDK-8306538: Zero variant build failure after JDK-8257967 In-Reply-To: References: Message-ID: <-IPiDnCACW5_M5g19QvV4skCGfa3qYdw2T8-vK1SrZ4=.3f79658b-3efb-4ca8-bbf9-20e97a0cf15f@github.com> On Thu, 20 Apr 2023 10:10:57 GMT, Zixian Cai wrote: > This follows the same style of fix applied in #13512. I noticed this issue when cross-compiling zero slowdebug for riscv64. I confirmed locally that this PR fixes the linking errors. I'll approve this to get a quick fix in place but: 1. I can't see why this should be specific to zero. 2. I think the better fix is that os::find_builtin_agent should be guarded by `#if INCLUDE_JVMTI`. Or even moved to a JVMTI file instead. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13557#pullrequestreview-1393859986 From dholmes at openjdk.org Thu Apr 20 12:34:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 12:34:53 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v9] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Thu, 20 Apr 2023 12:09:18 GMT, Aleksey Shipilev wrote: >> src/hotspot/os/posix/os_posix.cpp line 1342: >> >>> 1340: } >>> 1341: >>> 1342: static jlong nanos_to_nanos_bounded(jlong nanos) { >> >> I don't think we actually need this for nanos. There is no overflow potential and the MAX_SECS limit will be handled later. > > Hm, "later" where? Because I see the call to `pthread_condwait_timed` right after we produced abstime using `nanos_to_nanos_bounded`, and this looks like the last opportunity to handle the `MAX_SECS` limit: > https://github.com/openjdk/jdk/pull/13225/files#diff-7daa601d72ef74e5281faf8256b537b4ec9c5e5c236b716902592da12fa2aad2R1582 `to_abst_time` will call ` calc_rel_time` which checks for MAX_SECS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172515211 From shade at openjdk.org Thu Apr 20 12:34:54 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Apr 2023 12:34:54 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v9] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Thu, 20 Apr 2023 12:28:56 GMT, David Holmes wrote: >> Hm, "later" where? Because I see the call to `pthread_condwait_timed` right after we produced abstime using `nanos_to_nanos_bounded`, and this looks like the last opportunity to handle the `MAX_SECS` limit: >> https://github.com/openjdk/jdk/pull/13225/files#diff-7daa601d72ef74e5281faf8256b537b4ec9c5e5c236b716902592da12fa2aad2R1582 > > `to_abst_time` will call ` calc_rel_time` which checks for MAX_SECS. I see, good! I dropped `nanos_to_nanos_bounded` locally, and I would re-merge from master once GHA-s are healthy again. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1172519235 From zcai at openjdk.org Thu Apr 20 12:40:44 2023 From: zcai at openjdk.org (Zixian Cai) Date: Thu, 20 Apr 2023 12:40:44 GMT Subject: RFR: JDK-8306538: Zero variant build failure after JDK-8257967 In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 10:10:57 GMT, Zixian Cai wrote: > This follows the same style of fix applied in #13512. I noticed this issue when cross-compiling zero slowdebug for riscv64. I confirmed locally that this PR fixes the linking errors. Agreed. As discussed #13512, relevant code should be cleaned up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13557#issuecomment-1516252162 From zcai at openjdk.org Thu Apr 20 12:40:46 2023 From: zcai at openjdk.org (Zixian Cai) Date: Thu, 20 Apr 2023 12:40:46 GMT Subject: RFR: JDK-8306538: Zero variant build failure after JDK-8257967 In-Reply-To: <-IPiDnCACW5_M5g19QvV4skCGfa3qYdw2T8-vK1SrZ4=.3f79658b-3efb-4ca8-bbf9-20e97a0cf15f@github.com> References: <-IPiDnCACW5_M5g19QvV4skCGfa3qYdw2T8-vK1SrZ4=.3f79658b-3efb-4ca8-bbf9-20e97a0cf15f@github.com> Message-ID: On Thu, 20 Apr 2023 12:20:05 GMT, David Holmes wrote: > I think the better fix is that `os::find_builtin_agent` should be guarded by `#if INCLUDE_JVMTI`. Or even moved to a JVMTI file instead. Right. This is also the pattern used to exclude C1/C2 specific stuff from a zero build. Perhaps we can have a separate JBS issue to track these proposed changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13557#issuecomment-1516254023 From zcai at openjdk.org Thu Apr 20 12:44:56 2023 From: zcai at openjdk.org (Zixian Cai) Date: Thu, 20 Apr 2023 12:44:56 GMT Subject: Integrated: JDK-8306538: Zero variant build failure after JDK-8257967 In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 10:10:57 GMT, Zixian Cai wrote: > This follows the same style of fix applied in #13512. I noticed this issue when cross-compiling zero slowdebug for riscv64. I confirmed locally that this PR fixes the linking errors. This pull request has now been integrated. Changeset: 33a7978e Author: Zixian Cai Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/33a7978e85c0c2d610828f89fc1389696f55e1f2 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8306538: Zero variant build failure after JDK-8257967 Reviewed-by: shade, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13557 From coleenp at openjdk.org Thu Apr 20 13:14:48 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 13:14:48 GMT Subject: RFR: 8306474: Move InstanceKlass read-only flags [v2] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 01:46:46 GMT, David Holmes wrote: > Why was has_finalizer not included with this group? There are 4 flags that aren't included because these flags are referenced from assembly or compiler code. I'm going to write more about this in a comment in the bug as I'm trying to decide how to or whether to move them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13545#issuecomment-1516304294 From sspitsyn at openjdk.org Thu Apr 20 15:48:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 20 Apr 2023 15:48:48 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 01:27:37 GMT, Leonid Mesnik wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 38: > >> 36: * >> 37: * @requires vm.continuations >> 38: * @compile --enable-preview -source ${jdk.version} StopThreadTest.java > > --enable-preview is not needed anymore. Fixed, thanks. > test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/libStopThreadTest.cpp line 135: > >> 133: check_jvmti_status(jni, err, "prepareAgent: Failed in JVMTI SetBreakpoint"); >> 134: >> 135: err = jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_BREAKPOINT, NULL); > > We have set_event_notification_mode() in jvmti_h. Fixed, thanks. > test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/libStopThreadTest.cpp line 151: > >> 149: Java_StopThreadTest_resumeThread(JNIEnv *jni, jclass cls, jthread thread) { >> 150: LOG("Main: resumeThread\n"); >> 151: jvmtiError err = jvmti->ResumeThread(thread); > > there is suspend_thread/resume_thread() in jvmti.h Fixed, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1172784838 PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1172785542 PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1172785177 From sspitsyn at openjdk.org Thu Apr 20 16:40:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 20 Apr 2023 16:40:48 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: addressed review comments on new test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/bde41a00..0b26f42c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=00-01 Stats: 39 lines in 2 files changed: 9 ins; 13 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From sspitsyn at openjdk.org Thu Apr 20 16:42:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 20 Apr 2023 16:42:44 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 02:46:58 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed review comments on new test > > test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 48: > >> 46: static final int JVMTI_ERROR_NONE = 0; >> 47: static final int THREAD_NOT_SUSPENDED = 13; >> 48: static final int PASSED = 0; > > I think it would be better to don't use statuses and just throw exception after first failure. Usually the other results of other test cases might be corrupted and output just confuse user. I've made some changes to throw RuntimeException right away in the Main thread. However, this status is still needed for failures from the TestTask thread. Sync protocol with the TestTask thread needs to remain. Otherwise, the process hangs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1172847943 From kvn at openjdk.org Thu Apr 20 16:57:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Apr 2023 16:57:03 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v9] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 07:27:47 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - added breadcrumb in AnnotationParser about considering JVMCI should new annotation element types be added > - fixed javadoc comment Filed [JDK-8306581](https://bugs.openjdk.org/browse/JDK-8306581) ------------- PR Comment: https://git.openjdk.org/jdk/pull/12810#issuecomment-1516653533 From shade at openjdk.org Thu Apr 20 17:37:44 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Apr 2023 17:37:44 GMT Subject: RFR: 8305767: HdrSeq: support for a merge() method In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 23:03:02 GMT, William Kemper wrote: > A merge functionality on stats (distributions) was needed for the remembered set scan that I was using in some companion work. This PR implements a first cut at that, which is sufficient for our first (and only) use case. > > Unfortunately, for expediency, I am deferring work on decaying statistics, as a result of which users that want decaying statistics will get NaNs instead (or trigger guarantees). I have a general comment about this. It looks to me that the new method is actually bulk-add-er? So it should be e.g.: class NumberSeq { ... public virtual void add(NumberSeq& other) { ... } // adds all points from another number sequence Also, `clear_this` should probably be handled in a separate method (call). ------------- PR Review: https://git.openjdk.org/jdk/pull/13395#pullrequestreview-1394476025 From coleenp at openjdk.org Thu Apr 20 18:15:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 18:15:55 GMT Subject: RFR: 8305252: make_method_handle_intrinsic may call java code under a lock [v3] In-Reply-To: <83vphN_tVH1YDMrlNwvDTBPypjcPhH0YTEM5DBXP7Eg=.8fcfb9ef-29d8-4b48-8e76-cb3796d9545b@github.com> References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> <83vphN_tVH1YDMrlNwvDTBPypjcPhH0YTEM5DBXP7Eg=.8fcfb9ef-29d8-4b48-8e76-cb3796d9545b@github.com> Message-ID: On Wed, 19 Apr 2023 19:33:43 GMT, Coleen Phillimore wrote: >> This patch releases the InvokeMethodTable_lock while creating a method handle intrinsic. If there's a race, it frees a Method created by racing thread. The logic is simple but uses the deallocate_list infrastructure that's mostly used for redefinition making it less rare. With Dacapo2009, this adds about 20 Methods + constant pools to the list. Also the method has to call nmethod->flush which is assumed to be something only GC calls. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into invoketable > - Update method deallocate_contents comments. > - 8305252: make_method_handle_intrinsic may call java code under a lock Talking with @fisk makes me think the nmethod->flush is unsafe in the long term. The nmethod->flush takes out the CodeCache_lock, but deoptimization uses a RelaxedCompiledMethodIterator to iterate over nmethods. It only takes out the CodeCache_lock when calling next(). It assumes that the nmethod it returns won't be concurrently destroyed by GC, but this code also destroys an nmethod. This code is only called during a dedicated safepoint so is safe from deoptimization for now. But ClassLoaderDataGraph::purge can be called concurrently. CLDG::purge doesn't call Method::deallocate_contents because the metadata is released in bulk, but if this intrinsic Method is on the deallocate list we call free_deallocate_list_C_heap_structures() which doesn't flush the nmethod for this Method. So an nmethod in the code cache would point to a deleted Method. But it can't call flush() because this isn't in a safepoint and violate the code cache walking assumptions. The only ClassLoaderData that have these Methods on the deallocate_lists are ones that are never unloaded so this is safe for now. The "for now" safety is why I'm closing this PR and going with a different approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13308#issuecomment-1516750155 From coleenp at openjdk.org Thu Apr 20 18:15:57 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 18:15:57 GMT Subject: Withdrawn: 8305252: make_method_handle_intrinsic may call java code under a lock In-Reply-To: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> References: <54-vdzhwXXQxxaoXRfObZ9V2waJWJlVpINRtUTgkVVs=.cfd09f55-361d-4188-a5b6-cd44aff8f86d@github.com> Message-ID: On Mon, 3 Apr 2023 19:33:27 GMT, Coleen Phillimore wrote: > This patch releases the InvokeMethodTable_lock while creating a method handle intrinsic. If there's a race, it frees a Method created by racing thread. The logic is simple but uses the deallocate_list infrastructure that's mostly used for redefinition making it less rare. With Dacapo2009, this adds about 20 Methods + constant pools to the list. Also the method has to call nmethod->flush which is assumed to be something only GC calls. > > Tested with tier1-4. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13308 From alanb at openjdk.org Thu Apr 20 18:19:45 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 20 Apr 2023 18:19:45 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> On Thu, 20 Apr 2023 16:40:48 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments on new test src/hotspot/share/prims/jvmtiEnv.cpp line 1197: > 1195: if (is_virtual && !is_JavaThread_current(java_thread, thread_oop)) { > 1196: if (!JvmtiVTSuspender::is_vthread_suspended(thread_oop)) { > 1197: return JVMTI_ERROR_THREAD_NOT_SUSPENDED; Does JvmtiVTSuspender::is_vthread_suspended work for the alternative virtual thread implementation (-XX:+VMContinuations)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1172941986 From coleenp at openjdk.org Thu Apr 20 18:39:48 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 18:39:48 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v3] In-Reply-To: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> References: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> Message-ID: On Thu, 20 Apr 2023 08:41:58 GMT, Afshin Zafari wrote: >> - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. >> >> - The `-fcheck-new` is removed from the gcc compile flags. >> >> - The `operator new` and `operator delete` are deleted from `StackObj`. >> >> - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. >> - The `Thread::operator new`with and without `null` return are removed. >> >> ### Tests >> local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8305590: Remove nothrow exception specifications from operator new This is a nice cleanup. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13498#pullrequestreview-1394563701 From coleenp at openjdk.org Thu Apr 20 18:39:50 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 18:39:50 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v2] In-Reply-To: <4x0Wm39zfuJNcoQ5PCnW_7coiE5AqQRVMSYqKFW-KUc=.ffc385ac-9baf-4f6d-a65c-9443b495fcd5@github.com> References: <619AH3LBFaBdG1odgp4kpuCjm6ujfOivOUtoDYHdhCw=.96e5693b-e8c1-4923-88e4-11715889b822@github.com> <4mSYzcXKe-qzoXuUBeKHw6nzxEewqPXPkfeZ1kTExHQ=.adb1f609-ffad-43c7-9067-c9fdbeea4071@github.com> <4x0Wm39zfuJNcoQ5PCnW_7coiE5AqQRVMSYqKFW-KUc=.ffc385ac-9baf-4f6d-a65c-9443b495fcd5@github.com> Message-ID: On Thu, 20 Apr 2023 08:36:53 GMT, Afshin Zafari wrote: >> JVMTI does not abort on OOM it reports an error, so we definitely do not want a terminating allocator! >> >> jvmtiError >> JvmtiEnv::CreateRawMonitor(const char* name, jrawMonitorID* monitor_ptr) { >> JvmtiRawMonitor* rmonitor = new JvmtiRawMonitor(name); >> NULL_CHECK(rmonitor, JVMTI_ERROR_OUT_OF_MEMORY); > > The new operator is removed. We decide this at the call site though by adding the nothrow parameter. Adding an overloaded operator new without a nothrow parameter that we're not supposed to call seems very marginally useful. ie not useful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13498#discussion_r1172958617 From matsaave at openjdk.org Thu Apr 20 19:03:45 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 20 Apr 2023 19:03:45 GMT Subject: RFR: 8306482: Remove unused Method AccessFlags In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 00:35:06 GMT, Coleen Phillimore wrote: > Please review this small change to remove Method AccessFlags that are unused. These flags were moved to ConstMethod a long time ago. > Tested with tier1-4, SA tests locally Nice cleanup, LGTM! ------------- Marked as reviewed by matsaave (Committer). PR Review: https://git.openjdk.org/jdk/pull/13549#pullrequestreview-1394595045 From egahlin at openjdk.org Thu Apr 20 19:13:03 2023 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 20 Apr 2023 19:13:03 GMT Subject: Withdrawn: 8257967: JFR: Event for loaded agents In-Reply-To: References: Message-ID: On Tue, 7 Feb 2023 18:43:32 GMT, Erik Gahlin wrote: > Could I have a review of an event for native and Java agents. > > Testing: > - tier1 > - tier2 > - jdk/jdk/jfr/* > - 100 * TestLoadedAgent.java > > Rationale for event fields: > - name: to identify problematic third party agents > - options: to identify problematic options, such as too generic filters or conflicting port numbers, that could impact application behavior > - dynamic: if an agent was loaded by a user using jcmd, which could explain why problem only occur on some server instances > - java: if the agent is native, it could explain crashes > - loadTime: to understand if application problem correlates with the time the agent was loaded > > Alternatives: > - I considered making a non-periodic event that is emitted when the agent is loaded, but agents loaded at startup are then unlikely to make it into the recording. > - If it is a JPLIS agent, the jar name is used instead of "instrument". This is likely what users expect when using VirtualMachine::loadAgent(name, options) API or -javaagent:name=options > - I considered making all accesses to the agentLibrary list protected by a mutex, but it could potentially lead to deadlocks when non-JFR code iterates over the list. I think this is better fixed as a separate issue, if deemed necessary. So far so good. JFR iterates the list at every chunk rotation and not from the attach thread, so the risk for an unsafe access is real and synchronization is needed. > - When a JPLIS agent was loaded by attach, it wasn't detected properly, so I added a name check so I could pass true to the constructor and make TestLoadedAgent pass. It would possible to make all detection of JPLIS using the name "instrument" and not rely on a boolean in the constructor, but I deemed it outside the scope for the enhancement. > - I considered using Ticks::now() instead of os::javaTimeMillis() as time source, but TSC is not initialized this early. It could perhaps be fixed by moving the initialization earlier, but it might have other side effects, and is better done outside this enhancement. > > > Thanks > Erik This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12460 From coleenp at openjdk.org Thu Apr 20 19:16:52 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 19:16:52 GMT Subject: RFR: 8306482: Remove unused Method AccessFlags In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 00:35:06 GMT, Coleen Phillimore wrote: > Please review this small change to remove Method AccessFlags that are unused. These flags were moved to ConstMethod a long time ago. > Tested with tier1-4, SA tests locally Thanks David and Matias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13549#issuecomment-1516819018 From coleenp at openjdk.org Thu Apr 20 19:16:53 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 19:16:53 GMT Subject: Integrated: 8306482: Remove unused Method AccessFlags In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 00:35:06 GMT, Coleen Phillimore wrote: > Please review this small change to remove Method AccessFlags that are unused. These flags were moved to ConstMethod a long time ago. > Tested with tier1-4, SA tests locally This pull request has now been integrated. Changeset: afd2501f Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/afd2501fcc9f8ccb4993a6565d68b882e5130688 Stats: 27 lines in 4 files changed: 0 ins; 27 del; 0 mod 8306482: Remove unused Method AccessFlags Reviewed-by: dholmes, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/13549 From coleenp at openjdk.org Thu Apr 20 19:23:04 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 19:23:04 GMT Subject: RFR: 8306474: Move InstanceKlass read-only flags [v2] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 00:27:20 GMT, Coleen Phillimore wrote: >> Moved three read-only InstanceKlass flags out of AccessFlags to InstanceKlassFlags, and removed unused and unneeded SA code. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > John's suggestion Thanks John and David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13545#issuecomment-1516825031 From coleenp at openjdk.org Thu Apr 20 19:23:06 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Apr 2023 19:23:06 GMT Subject: Integrated: 8306474: Move InstanceKlass read-only flags In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 22:46:55 GMT, Coleen Phillimore wrote: > Moved three read-only InstanceKlass flags out of AccessFlags to InstanceKlassFlags, and removed unused and unneeded SA code. > Tested with tier1-4. This pull request has now been integrated. Changeset: f6336231 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/f63362310e17ba5c3e415ef3c5bd5f9bd65fd67c Stats: 45 lines in 10 files changed: 12 ins; 26 del; 7 mod 8306474: Move InstanceKlass read-only flags Reviewed-by: jrose, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13545 From cslucas at openjdk.org Thu Apr 20 19:27:58 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 20 Apr 2023 19:27:58 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Catching up with master Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Fix tests. Remember previous reducible Phis. - Address PR review 3. Some comments and be able to abort compilation. - Merge with Master - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. - Add support for SR'ing some inputs of merges used for field loads - Fix some typos and do some small refactorings. - Merge master - Add support for rematerializing scalar replaced objects participating in allocation merges ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=09 Stats: 2253 lines in 26 files changed: 1992 ins; 108 del; 153 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From cslucas at openjdk.org Thu Apr 20 20:19:51 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 20 Apr 2023 20:19:51 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 20 Apr 2023 19:27:58 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - Merge master > - Add support for rematerializing scalar replaced objects participating in allocation merges Thank you for testing, Vladimir. I was able to reproduce the IR test failures on AArch64 with -UseTLAB. I'll push a fix later today. Looks like the other failures are due to: https://bugs.openjdk.org/browse/JDK-8306581 ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1516893566 From cjplummer at openjdk.org Thu Apr 20 22:35:47 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 20 Apr 2023 22:35:47 GMT Subject: RFR: 8233725: ProcessTools.startProcess() has output issues when using an OutputAnalyzer at the same time [v2] In-Reply-To: References: Message-ID: <-G9WskHyBL-VvD-95BHoWsSn9cSiQc5mpUayH2_rUBM=.638204df-fa19-4c70-b51e-d4c7b1eb8290@github.com> On Thu, 20 Apr 2023 18:40:52 GMT, Leonid Mesnik wrote: >> ProcessTools.startProcess() creates process and read it's output error streams. So the any other using of corresponding Process.getInputStream() and Process.getErrorStream() doesn't get process streams. >> >> This fix preserve process streams content and allow to read it after process completion. The another possible solution would be to throw exception when user tries to read already drained streams to fail tests earlier. However it complicates usage of ProcessTools.startProcess() methods. >> >> The regression test has been provided with issue. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > JStackStressTest.java updated. test/jdk/sun/tools/jhsdb/JStackStressTest.java line 86: > 84: } catch (InterruptedException e) { > 85: } > 86: OutputAnalyzer jshellOutput = new OutputAnalyzer(jShellProcess); It's not clear to me how moving this is fixing anything. test/jdk/sun/tools/jstatd/JstatdTest.java line 357: > 355: assertEquals(stdout.size(), 1, "Output should contain one line"); > 356: assertTrue(stdout.get(0).startsWith("jstatd started"), "List should start with 'jstatd started'"); > 357: assertNotEquals(output.getExitValue(), 0, Before your fix, was the "jstatd started" line being missed because of this bug. test/lib/jdk/test/lib/process/ProcessTools.java line 750: > 748: public InputStream getInputStream() { > 749: try { > 750: waitFor(); With this added `waitFor()` the assumption now is that the caller doesn't intent to do incremental reads of the output as the process generates it. For example, if the test were to send some command to the process and then want to read the resulting output, and do this repeatedly, it won't able to use the InputStream to do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13560#discussion_r1173141642 PR Review Comment: https://git.openjdk.org/jdk/pull/13560#discussion_r1173139688 PR Review Comment: https://git.openjdk.org/jdk/pull/13560#discussion_r1173146951 From iklam at openjdk.org Thu Apr 20 22:41:45 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 20 Apr 2023 22:41:45 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block [v8] In-Reply-To: References: Message-ID: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1000 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'master' into 8298048-combine-cds-heap-to-single-region-PUSH - Merge branch 'master' into 8298048-combine-cds-heap-to-single-region-PUSH - Fixed assert in runtime/cds/appcds/SharedArchiveConsistency.java - Removal of JFR custom closed/open archive region types - Remove g1 full gc skip marking optimization - Some comment updates - Move g1collectedheap archive related regions together in the cpp file - Factor out region/range iteration - Fix comment - Ioi fix - ... and 12 more: https://git.openjdk.org/jdk/compare/f6336231...e8041d50 ------------- Changes: https://git.openjdk.org/jdk/pull/13284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13284&range=07 Stats: 3252 lines in 83 files changed: 159 ins; 2446 del; 647 mod Patch: https://git.openjdk.org/jdk/pull/13284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13284/head:pull/13284 PR: https://git.openjdk.org/jdk/pull/13284 From dholmes at openjdk.org Thu Apr 20 22:57:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Apr 2023 22:57:45 GMT Subject: RFR: 8233725: ProcessTools.startProcess() has output issues when using an OutputAnalyzer at the same time [v2] In-Reply-To: <-G9WskHyBL-VvD-95BHoWsSn9cSiQc5mpUayH2_rUBM=.638204df-fa19-4c70-b51e-d4c7b1eb8290@github.com> References: <-G9WskHyBL-VvD-95BHoWsSn9cSiQc5mpUayH2_rUBM=.638204df-fa19-4c70-b51e-d4c7b1eb8290@github.com> Message-ID: On Thu, 20 Apr 2023 22:31:55 GMT, Chris Plummer wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> JStackStressTest.java updated. > > test/lib/jdk/test/lib/process/ProcessTools.java line 750: > >> 748: public InputStream getInputStream() { >> 749: try { >> 750: waitFor(); > > With this added `waitFor()` the assumption now is that the caller doesn't intent to do incremental reads of the output as the process generates it. For example, if the test were to send some command to the process and then want to read the resulting output, and do this repeatedly, it won't able to use the InputStream to do that. I have to agree with Chris. You are changing a fundamental property of this API. We no longer just start the process, we are forced to wait for it to complete. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13560#discussion_r1173158259 From lmesnik at openjdk.org Thu Apr 20 23:05:45 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 20 Apr 2023 23:05:45 GMT Subject: RFR: 8233725: ProcessTools.startProcess() has output issues when using an OutputAnalyzer at the same time [v2] In-Reply-To: <-G9WskHyBL-VvD-95BHoWsSn9cSiQc5mpUayH2_rUBM=.638204df-fa19-4c70-b51e-d4c7b1eb8290@github.com> References: <-G9WskHyBL-VvD-95BHoWsSn9cSiQc5mpUayH2_rUBM=.638204df-fa19-4c70-b51e-d4c7b1eb8290@github.com> Message-ID: On Thu, 20 Apr 2023 22:22:06 GMT, Chris Plummer wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> JStackStressTest.java updated. > > test/jdk/sun/tools/jhsdb/JStackStressTest.java line 86: > >> 84: } catch (InterruptedException e) { >> 85: } >> 86: OutputAnalyzer jshellOutput = new OutputAnalyzer(jShellProcess); > > It's not clear to me how moving this is fixing anything. It is the main issues with current approach. The outputanalyzer tries to read from streams which are available only when process finishes. This test interact with process so it just hangs waiting of process completion. > test/jdk/sun/tools/jstatd/JstatdTest.java line 357: > >> 355: assertEquals(stdout.size(), 1, "Output should contain one line"); >> 356: assertTrue(stdout.get(0).startsWith("jstatd started"), "List should start with 'jstatd started'"); >> 357: assertNotEquals(output.getExitValue(), 0, > > Before your fix, was the "jstatd started" line being missed because of this bug. Yep. the output was "empty" before fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13560#discussion_r1173155279 PR Review Comment: https://git.openjdk.org/jdk/pull/13560#discussion_r1173155433 From lmesnik at openjdk.org Thu Apr 20 23:05:48 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 20 Apr 2023 23:05:48 GMT Subject: RFR: 8233725: ProcessTools.startProcess() has output issues when using an OutputAnalyzer at the same time [v2] In-Reply-To: References: <-G9WskHyBL-VvD-95BHoWsSn9cSiQc5mpUayH2_rUBM=.638204df-fa19-4c70-b51e-d4c7b1eb8290@github.com> Message-ID: <_dsCvusmXp4WGZ-9RZiihf92Rqd0_lX-udKls-ZGKnE=.6c09a4d0-e5ae-4676-89c3-9206213d4628@github.com> On Thu, 20 Apr 2023 22:54:49 GMT, David Holmes wrote: >> test/lib/jdk/test/lib/process/ProcessTools.java line 750: >> >>> 748: public InputStream getInputStream() { >>> 749: try { >>> 750: waitFor(); >> >> With this added `waitFor()` the assumption now is that the caller doesn't intent to do incremental reads of the output as the process generates it. For example, if the test were to send some command to the process and then want to read the resulting output, and do this repeatedly, it won't able to use the InputStream to do that. > > I have to agree with Chris. You are changing a fundamental property of this API. We no longer just start the process, we are forced to wait for it to complete. Exactly. I added note about implementation in the javadoc to make it clear. I don't see any good solution for this problem. The only other possible solution which I see is to throw Exception here, forcing user to use lineConsumer. However the any usage of OutputAnalyzer with startProcess() would clearly and quickly fails. Also, the public static Process startProcess(String name, ProcessBuilder processBuilder) could be modified to allow to read process streams. The test should drain tests by itself in such case to avoid hang. However, it don't see any good way to implement this method correctly if already read the process streams. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13560#discussion_r1173161520 From duke at openjdk.org Fri Apr 21 02:37:04 2023 From: duke at openjdk.org (SUN Guoyun) Date: Fri, 21 Apr 2023 02:37:04 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v16] In-Reply-To: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> References: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> Message-ID: <3rkPRibD2-ZVCzw62DT1EpmeoHaeLBhIFuOClAXRGvE=.85f13a21-bdbd-4cad-88fc-d3c704a6c2ae@github.com> On Tue, 28 Mar 2023 19:50:36 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC port was provided by @reinrich, RISCV was provided by @DingliZhang and @zifeihan, and S390x by @offamitkumar. >> >> This change supports the following platforms: x86, aarch64, PPC, RISCV, and S390x > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > s390x NULL to nullptr src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2233: > 2231: > 2232: __ load_resolved_indy_entry(cache, index); > 2233: __ membar(MacroAssembler::AnyAny); Why is the AnyAny barrier used here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1173246912 From lmesnik at openjdk.org Fri Apr 21 04:45:04 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 21 Apr 2023 04:45:04 GMT Subject: Withdrawn: 8233725: ProcessTools.startProcess() has output issues when using an OutputAnalyzer at the same time In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 16:09:29 GMT, Leonid Mesnik wrote: > ProcessTools.startProcess() creates process and read it's output error streams. So the any other using of corresponding Process.getInputStream() and Process.getErrorStream() doesn't get process streams. > > This fix preserve process streams content and allow to read it after process completion. The another possible solution would be to throw exception when user tries to read already drained streams to fail tests earlier. However it complicates usage of ProcessTools.startProcess() methods. > > The regression test has been provided with issue. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13560 From sspitsyn at openjdk.org Fri Apr 21 06:08:43 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 21 Apr 2023 06:08:43 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> References: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> Message-ID: On Thu, 20 Apr 2023 18:17:19 GMT, Alan Bateman wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed review comments on new test > > src/hotspot/share/prims/jvmtiEnv.cpp line 1197: > >> 1195: if (is_virtual && !is_JavaThread_current(java_thread, thread_oop)) { >> 1196: if (!JvmtiVTSuspender::is_vthread_suspended(thread_oop)) { >> 1197: return JVMTI_ERROR_THREAD_NOT_SUSPENDED; > > Does JvmtiVTSuspender::is_vthread_suspended work for the alternative virtual thread implementation (-XX:+VMContinuations)? Nice catch - fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173336527 From dholmes at openjdk.org Fri Apr 21 06:40:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 21 Apr 2023 06:40:58 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 16:40:48 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments on new test src/hotspot/share/prims/jvmtiEnv.cpp line 1200: > 1198: } > 1199: if (java_thread == nullptr) { // unmounted virtual thread > 1200: return JVMTI_ERROR_OPAQUE_FRAME; Where is the check for "suspended at an event" that otherwise results in `JVMTI_ERROR_OPAQUE_FRAME`? test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 33: > 31: * Its method run() invokes the following methods: > 32: * - method A() that is blocked on a monitor > 33: * - method B() that is stopped at a brakepoint s/brakepoint/breakpoint/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173352162 PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173356185 From fjiang at openjdk.org Fri Apr 21 06:43:06 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 21 Apr 2023 06:43:06 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v16] In-Reply-To: <3rkPRibD2-ZVCzw62DT1EpmeoHaeLBhIFuOClAXRGvE=.85f13a21-bdbd-4cad-88fc-d3c704a6c2ae@github.com> References: <5YPjVZsQAVVL9PHaGu7Mc7kNE0WvPrsiCubKSGzKxvk=.78e8f8fb-1005-4bd7-9ad1-d1dd3aacbe04@github.com> <3rkPRibD2-ZVCzw62DT1EpmeoHaeLBhIFuOClAXRGvE=.85f13a21-bdbd-4cad-88fc-d3c704a6c2ae@github.com> Message-ID: <4UK8s-Hj57YFE1RKMi5LiA-ZuvTn2-cQ7uPZRASZF_E=.1ade79d2-3e3f-486b-8ac2-8dc459f42ffc@github.com> On Fri, 21 Apr 2023 02:33:37 GMT, SUN Guoyun wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> s390x NULL to nullptr > > src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2233: > >> 2231: >> 2232: __ load_resolved_indy_entry(cache, index); >> 2233: __ membar(MacroAssembler::AnyAny); > > Why is the AnyAny barrier used here? Hi @sunny868, I'm working on removing these unnecessary barriers. RISC-V port uses more conservative barriers like this for some reasons (e.g.: [1][2][3]), we can just remove them. 1. https://github.com/openjdk/jdk/blob/36ec05d52a79185d8c6669713fd17933128c032a/src/hotspot/cpu/riscv/templateTable_riscv.cpp#L3438-L3443 2. https://github.com/openjdk/jdk/blob/36ec05d52a79185d8c6669713fd17933128c032a/src/hotspot/cpu/riscv/templateTable_riscv.cpp#L3558-L3563 3. https://github.com/openjdk/jdk/blob/36ec05d52a79185d8c6669713fd17933128c032a/src/hotspot/cpu/riscv/templateTable_riscv.cpp#L3614-L3619 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1173362912 From alanb at openjdk.org Fri Apr 21 07:15:53 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 21 Apr 2023 07:15:53 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v7] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Thu, 20 Apr 2023 08:12:50 GMT, Aleksey Shipilev wrote: >> src/java.base/share/classes/java/lang/Thread.java line 1: >> >>> 1: /* >> >> Why doesn't `Thread.sleep(long millis)` simply call `Thread.sleep(millis, 0)` instead of duplicating most of the code? > > That would change the stack depth, for which some tests are sensitive. https://github.com/openjdk/jdk/pull/13225#issuecomment-1514461529. I think merging `sleep` implementations is something to do in a targeted cleanup, which would handle those test updates too. One this change is in, I'd like to do some refactoring and have a private sleepNanos that the 3 sleep methods call with the nanos value. That will change the stack depth so it means a few tests need update, but we know what they are now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13225#discussion_r1173391987 From sspitsyn at openjdk.org Fri Apr 21 07:51:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 21 Apr 2023 07:51:48 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 06:31:52 GMT, David Holmes wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed review comments on new test > > test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 33: > >> 31: * Its method run() invokes the following methods: >> 32: * - method A() that is blocked on a monitor >> 33: * - method B() that is stopped at a brakepoint > > s/brakepoint/breakpoint/ Thanks, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173437781 From sspitsyn at openjdk.org Fri Apr 21 08:03:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 21 Apr 2023 08:03:47 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 06:26:33 GMT, David Holmes wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed review comments on new test > > src/hotspot/share/prims/jvmtiEnv.cpp line 1200: > >> 1198: } >> 1199: if (java_thread == nullptr) { // unmounted virtual thread >> 1200: return JVMTI_ERROR_OPAQUE_FRAME; > > Where is the check for "suspended at an event" that otherwise results in `JVMTI_ERROR_OPAQUE_FRAME`? The JVMTI `StopThread` spec has this description: > The StopThread function may be used to send an asynchronous > exception to a virtual thread when it is suspended at an event. > An implementation may support sending an asynchronous exception > to a suspended virtual thread in other cases. > . . . > JVMTI_ERROR_OPAQUE_FRAME: > The thread is a suspended virtual thread and the implementation > was unable to throw an asynchronous exception from this frame. This update supports all suspended mounted cases of virtual threads and returns OPAQUE_FRAME only if the target virtual thread is suspended and unmounted. But we avoid using the mount/unmount terms in the JVMTI spec. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173451827 From sspitsyn at openjdk.org Fri Apr 21 08:09:55 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 21 Apr 2023 08:09:55 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v3] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: corrections for BoundVirtualThread and test typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/0b26f42c..d2cc010e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=01-02 Stats: 20 lines in 4 files changed: 16 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From duke at openjdk.org Fri Apr 21 09:54:47 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 21 Apr 2023 09:54:47 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v3] In-Reply-To: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> References: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> Message-ID: On Thu, 20 Apr 2023 08:41:58 GMT, Afshin Zafari wrote: >> - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. >> >> - The `-fcheck-new` is removed from the gcc compile flags. >> >> - The `operator new` and `operator delete` are deleted from `StackObj`. >> >> - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. >> - The `Thread::operator new`with and without `null` return are removed. >> >> ### Tests >> local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8305590: Remove nothrow exception specifications from operator new Thanks for the reviews and the helpful discussions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13498#issuecomment-1517574843 From duke at openjdk.org Fri Apr 21 09:57:53 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Fri, 21 Apr 2023 09:57:53 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay Message-ID: As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. This is immensely useful for investigating time-to-safepoint issues in low latency space. ------------- Commit messages: - JDK-8305506 Added support for fractional values of SafepointTimeoutDelay Changes: https://git.openjdk.org/jdk/pull/13373/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305506 Stats: 9 lines in 3 files changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13373/head:pull/13373 PR: https://git.openjdk.org/jdk/pull/13373 From robilad at openjdk.org Fri Apr 21 09:57:54 2023 From: robilad at openjdk.org (Dalibor Topic) Date: Fri, 21 Apr 2023 09:57:54 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 13:23:40 GMT, Wojciech Kudla wrote: > As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. > This is immensely useful for investigating time-to-safepoint issues in low latency space. Hi, please send an e-Mail to dalibor.topic at oracle.com so that I can mark your account as verified. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13373#issuecomment-1508891615 From mdoerr at openjdk.org Fri Apr 21 10:34:43 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Apr 2023 10:34:43 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 13:23:40 GMT, Wojciech Kudla wrote: > As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. > This is immensely useful for investigating time-to-safepoint issues in low latency space. Changing the type of a product flag requires a CSR (https://wiki.openjdk.org/display/csr/CSR+FAQs). src/hotspot/share/runtime/safepoint.cpp line 382: > 380: // Set the limit time, so that it can be compared to see if this has taken > 381: // too long to complete. > 382: safepoint_limit_time = SafepointTracing::start_of_safepoint() + (jlong)SafepointTimeoutDelay * NANOSECS_PER_MILLISEC; I think it should be `(SafepointTimeoutDelay * NANOSECS_PER_MILLISEC)` before converting to jlong. ------------- PR Review: https://git.openjdk.org/jdk/pull/13373#pullrequestreview-1395517750 PR Review Comment: https://git.openjdk.org/jdk/pull/13373#discussion_r1173610701 From duke at openjdk.org Fri Apr 21 13:09:37 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 21 Apr 2023 13:09:37 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Updated after review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13477/files - new: https://git.openjdk.org/jdk/pull/13477/files/c2dd899c..3b4c0fa3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13477&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13477&range=00-01 Stats: 65 lines in 8 files changed: 4 ins; 37 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13477/head:pull/13477 PR: https://git.openjdk.org/jdk/pull/13477 From dholmes at openjdk.org Fri Apr 21 13:12:51 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 21 Apr 2023 13:12:51 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 07:58:45 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnv.cpp line 1200: >> >>> 1198: } >>> 1199: if (java_thread == nullptr) { // unmounted virtual thread >>> 1200: return JVMTI_ERROR_OPAQUE_FRAME; >> >> Where is the check for "suspended at an event" that otherwise results in `JVMTI_ERROR_OPAQUE_FRAME`? > > The JVMTI `StopThread` spec has this description: >> The StopThread function may be used to send an asynchronous >> exception to a virtual thread when it is suspended at an event. >> An implementation may support sending an asynchronous exception >> to a suspended virtual thread in other cases. >> . . . >> JVMTI_ERROR_OPAQUE_FRAME: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > This update supports all suspended mounted cases of virtual threads and returns OPAQUE_FRAME only if the target virtual thread is suspended and unmounted. > But we avoid using the mount/unmount terms in the JVMTI spec. What does "suspended at an event" mean? As a programmer trying to use this how am I supposed to know when it can be used without getting an error? I find it very surprising that the error would occur with an unmounted thread - having a VT throw when it was remounted seems the most natural way to implement this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173749491 From duke at openjdk.org Fri Apr 21 13:19:52 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 21 Apr 2023 13:19:52 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 10:05:10 GMT, Richard Reingruber wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated after review > > src/hotspot/cpu/ppc/continuationFreezeThaw_ppc.inline.hpp line 267: > >> 265: intptr_t *sp, *fp; >> 266: if (FKind::interpreted) { >> 267: intptr_t offset = *f.addr_at(ijava_idx(locals)); > > I'd prefer a more specific name. Maybe `locals_offset` or `local0_offset`? Changed the name to `locals_offset` (in all platforms). > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1092: > >> 1090: assert(heap_frame_bottom == heap_frame_top + fsize, ""); >> 1091: >> 1092: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. > > Typo > Suggestion: > > // Some architectures (like AArch64/PPC64/RISC-V) add padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. Fixed > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2156: > >> 2154: assert(!f.is_heap_frame(), "should not be"); >> 2155: >> 2156: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. > > Suggestion: > > // Some architectures (like AArch64/PPC64/RISC-V) add padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173753626 PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173755462 PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173756029 From duke at openjdk.org Fri Apr 21 13:19:55 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 21 Apr 2023 13:19:55 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: <45-ONQI9dSdGx8chKpms07dh_TfXJUWWCAsYiAd32rM=.cd749431-c12e-42ab-a607-e4ca60bce69b@github.com> <8gSnJ7pOgAbvd6pjxrA3DBtnJ4tC-G6LbHGClDmEKhU=.8ebbe4e2-784f-41a5-8d9f-13bc65de0eee@github.com> <5Lh1b1RiwJy5YfgQcQhmNy-ufET1y_gxidS-ldEMFh4=.abbcb2f8-0022-4334-a21f-f70c7f4bad92@github.com> Message-ID: On Mon, 17 Apr 2023 19:13:33 GMT, Fredrik Bredberg wrote: >> Is it possible to get an equivalent but platform independent version of the assertion? >> Something like `assert(f.interpreter_frame_local_at(0) == stack_frame_bottom - 1, "");` might work. >> It could replace the call of `set_interpreter_frame_bottom()`. >> After all with this pr no platform will ever have to actually set the interpreter frame bottom so it would be good to at least rename the method. > > Sounds like a plan. I'll look into it. Replaced the call to `set_interpreter_frame_bottom` in `ThawBase::recurse_thaw_interpreted_frame` with: `assert(f.interpreter_frame_local_at(0) == stack_frame_bottom - 1, "invalid frame bottom"); `Then removed the `set_interpreter_frame_bottom` method from all platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173751267 From shade at openjdk.org Fri Apr 21 13:21:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Apr 2023 13:21:09 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v10] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Merge branch 'master' into JDK-83050920-thread-sleep-subms - Drop nanos_to_nanos_bounded - Handle overflows - More review comments - Adjust test times - Windows again - Windows fixes: align(...) is only for power-of-two alignments - Adjust assert - Replace (park|sleep)_millis back with just (park|sleep) - More review touchups - ... and 13 more: https://git.openjdk.org/jdk/compare/5a00617b...a0c35f45 ------------- Changes: https://git.openjdk.org/jdk/pull/13225/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=09 Stats: 254 lines in 11 files changed: 226 ins; 9 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From duke at openjdk.org Fri Apr 21 13:23:50 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 21 Apr 2023 13:23:50 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: <4e8KcPoKR30cttDfzPPwWCH1BNLje1EyVJKSjyMRHXY=.e60bcac8-4bd8-431a-8e5d-e5519718f0d0@github.com> On Mon, 17 Apr 2023 19:02:12 GMT, Patricio Chilano Mateo wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated after review > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2152: > >> 2150: >> 2151: assert((stack_frame_bottom >= stack_frame_top + fsize) && >> 2152: (stack_frame_bottom <= stack_frame_top + fsize + 1), ""); // internal alignment on aarch64 > > Since we didn't add any new padding shouldn't this assert now be stack_frame_bottom == stack_frame_top + fsize? Your're right @pchilano, it should. But since the line just above is: `const int fsize = heap_frame_bottom - heap_frame_top;` I thought that there was no use in keeping the `assert` at all, so I removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173760286 From duke at openjdk.org Fri Apr 21 13:29:52 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 21 Apr 2023 13:29:52 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 12:01:05 GMT, Andrew Haley wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated after review > > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 149: > >> 147: // on AARCH64, we may insert padding between the locals and the rest of the frame >> 148: // (see TemplateInterpreterGenerator::generate_normal_entry, and AbstractInterpreter::layout_activation) >> 149: // since we freeze the padding word (see recurse_freeze_interpreted_frame) in order to keep the same relativized > > Suggestion: > > // because we freeze the padding word (see recurse_freeze_interpreted_frame) in order to keep the same relativized > > ... for clarity. Fixed (in all platforms). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173763746 From duke at openjdk.org Fri Apr 21 13:29:55 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 21 Apr 2023 13:29:55 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: <4GITjfS1eT5rMwMKT6-ZdSEN6KXZQms8CaKSfK_rAoE=.211c7cd5-bb14-4bfa-b5a0-23ead8251206@github.com> Message-ID: On Tue, 18 Apr 2023 12:48:45 GMT, Andrew Haley wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2157: >> >>> 2155: >>> 2156: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. >>> 2157: // On those architectures we thaw the padding in order to keep the same localized pointer values. >> >> Somehow this suggestion was not delivered... >> Suggestion: >> >> // On those architectures we thaw the padding in order to keep the same relative references. > > This is very unclear. I think it means the same "relative pointers," otherwise known as "offsets." Tried to clarify by changing the comment to: `// On those architectures we freeze the padding in order to keep the same fp-relative offsets in the fixed_frame.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173765981 From alanb at openjdk.org Fri Apr 21 13:38:43 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 21 Apr 2023 13:38:43 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 13:09:58 GMT, David Holmes wrote: >> The JVMTI `StopThread` spec has this description: >>> The StopThread function may be used to send an asynchronous >>> exception to a virtual thread when it is suspended at an event. >>> An implementation may support sending an asynchronous exception >>> to a suspended virtual thread in other cases. >>> . . . >>> JVMTI_ERROR_OPAQUE_FRAME: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> This update supports all suspended mounted cases of virtual threads and returns OPAQUE_FRAME only if the target virtual thread is suspended and unmounted. >> But we avoid using the mount/unmount terms in the JVMTI spec. > > What does "suspended at an event" mean? As a programmer trying to use this how am I supposed to know when it can be used without getting an error? > > I find it very surprising that the error would occur with an unmounted thread - having a VT throw when it was remounted seems the most natural way to implement this. I think "suspended at an event" is okay. It means the callback for an event has been triggered and the agent suspended the thread. The typical use-case for JVMTI StopThread is when at a breakpoint or when single stepping and the user asks the debugger to throw some exception so that the code's handling of the exception can been debugged/tested. Debugger and JDWP agent aside, I don't know if there are other agents using this JVMTI function. If there are other and they call this function on some random virtual thread at some random time then the function will fail. One other point around this is that the plan is to have StopThread, ForceEarlyReturn, PopFrame and SetLocalXXX work more consistently. Right now, SetLocalXXX minimally requires a virtual thread be suspended at a breakpoint or single step event. The minimum support can be broader to be suspended at any event. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173774433 From duke at openjdk.org Fri Apr 21 13:39:50 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 21 Apr 2023 13:39:50 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 10:01:15 GMT, Richard Reingruber wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated after review > > src/hotspot/cpu/ppc/continuationFreezeThaw_ppc.inline.hpp line 510: > >> 508: // we need to set the locals so that the caller of new_stack_frame() can call >> 509: // ContinuationHelper::InterpretedFrame::frame_bottom >> 510: // copy relativized locals from the heap frame > > Maybe reduce the comment? > > // we need to copy the locals so that the caller of new_stack_frame() can call > // ContinuationHelper::InterpretedFrame::frame_bottom I think the comment is good as it is, because it describes just why we need to set the locals this early (and not the rest of the members in the fixed_frame). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173778601 From cslucas at openjdk.org Fri Apr 21 15:10:49 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 21 Apr 2023 15:10:49 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v9] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> Message-ID: <8AmU_ta4meiUmO99Em5bV7XLAV4H9fAcil519yh70fU=.1a28f4a9-a992-43a7-8c4a-d1cf96835963@github.com> On Thu, 20 Apr 2023 00:35:19 GMT, Vladimir Kozlov wrote: > Again got failures in the test on Aarch64 running with -XX:-UseTLAB: > > ``` > testCmpMergeWithNull(boolean,int,int): > - Failed comparison: [found] 0 = 2 [given] > testCmpMergeWithNull_Second(boolean,int,int) > - Failed comparison: [found] 0 = 1 [given] > testMergedAccessAfterCallNoWrite(boolean,int,int) > - Failed comparison: [found] 2 = 3 [given] > testMergedAccessAfterCallWithWrite(boolean,int,int) > - Failed comparison: [found] 2 = 3 [given] > testNestedObjectsArray(boolean,int,int) > - Failed comparison: [found] 2 = 4 [given] > ``` @vnkozlov - The reason for these failures is due to an issue in the test framework ALLOC Regex: https://bugs.openjdk.org/browse/JDK-8306625 . Since only the tests added in this PR are failing due to that problem do you think I should create a separate PR to fix the Regex or just include the fix in this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1517977996 From iklam at openjdk.org Fri Apr 21 15:32:59 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 21 Apr 2023 15:32:59 GMT Subject: RFR: 8298048: Combine CDS archive heap into a single block In-Reply-To: <3ybhZpVsw5iki0H2OkFswEIqFfEFC0YwNhp9chzu5yU=.8b29a446-3be0-4ae8-a8d0-948003be0411@github.com> References: <3ybhZpVsw5iki0H2OkFswEIqFfEFC0YwNhp9chzu5yU=.8b29a446-3be0-4ae8-a8d0-948003be0411@github.com> Message-ID: On Tue, 11 Apr 2023 21:47:40 GMT, Ashutosh Mehra wrote: >> This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1100 lines in src code and another ~1200 lines of tests. >> >> **Notes for reviewers:** >> - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. >> - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). >> - It might be easier to see the diff with whitespaces off. >> - There are two major changes in the G1 code >> - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) >> - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) >> - Testing changes: >> - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. >> - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. >> >> **Testing:** >> - Mach5 tiers 1 ~ 7 > > cds changes look good! just few nitpicks. Thanks @ashu-mehra @matias9927 @tschatzl for the code review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13284#issuecomment-1518001489 From iklam at openjdk.org Fri Apr 21 15:33:00 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 21 Apr 2023 15:33:00 GMT Subject: Integrated: 8298048: Combine CDS archive heap into a single block In-Reply-To: References: Message-ID: On Mon, 3 Apr 2023 03:32:27 GMT, Ioi Lam wrote: > This PR combines the "open" and "closed" regions of the CDS archive heap into a single region. This significantly simplifies the implementation, making it more compatible with non-G1 collectors. There's a net removal of ~1100 lines in src code and another ~1200 lines of tests. > > **Notes for reviewers:** > - Most of the code changes in CDS are removing the handling of "open" vs "closed" objects. > - Reviewers can start with ArchiveHeapWriter::copy_source_objs_to_buffer(). > - It might be easier to see the diff with whitespaces off. > - There are two major changes in the G1 code > - The archived objects are now stored in the "old" region (see g1CollectedHeap.cpp in [58d720e](https://github.com/openjdk/jdk/pull/13284/commits/58d720e294bb36f21cb88cddde724ed2b9e93770)) > - The majority of the other changes are removal of the "archive" region type (see heapRegionType.hpp). For ease of review, such code is isolated in [a852dfb](https://github.com/openjdk/jdk/pull/13284/commits/a852dfbbf5ff56e035399f7cc3704f29e76697f6) > - Testing changes: > - Now the archived java objects can move, along with the "old" regions that contain them. It's no longer possible to test whether a heap object came from CDS. As a result, the `WhiteBox.isShared(Object o)` API has been removed. > - Many tests that uses this API are removed. Most of them were written during early development of CDS archived objects and are no longer relevant. > > **Testing:** > - Mach5 tiers 1 ~ 7 This pull request has now been integrated. Changeset: 723037a7 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/723037a79d2a43b9a1a247d8f81a47907faadab1 Stats: 3252 lines in 83 files changed: 159 ins; 2446 del; 647 mod 8298048: Combine CDS archive heap into a single block Co-authored-by: Thomas Schatzl Reviewed-by: matsaave, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/13284 From pchilanomate at openjdk.org Fri Apr 21 16:06:45 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 21 Apr 2023 16:06:45 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: <4e8KcPoKR30cttDfzPPwWCH1BNLje1EyVJKSjyMRHXY=.e60bcac8-4bd8-431a-8e5d-e5519718f0d0@github.com> References: <4e8KcPoKR30cttDfzPPwWCH1BNLje1EyVJKSjyMRHXY=.e60bcac8-4bd8-431a-8e5d-e5519718f0d0@github.com> Message-ID: On Fri, 21 Apr 2023 13:21:15 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2152: >> >>> 2150: >>> 2151: assert((stack_frame_bottom >= stack_frame_top + fsize) && >>> 2152: (stack_frame_bottom <= stack_frame_top + fsize + 1), ""); // internal alignment on aarch64 >> >> Since we didn't add any new padding shouldn't this assert now be stack_frame_bottom == stack_frame_top + fsize? > > Your're right @pchilano, it should. But since the line just above is: > `const int fsize = heap_frame_bottom - heap_frame_top;` > I thought that there was no use in keeping the `assert` at all, so I removed it. Yes, but it's comparing it with the created stack frame's size so I think it doesn't hurt to keep it. We have an equivalent assert in recurse_freeze_interpreted_frame(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1173943088 From ysr at openjdk.org Fri Apr 21 16:10:44 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 21 Apr 2023 16:10:44 GMT Subject: RFR: 8305767: HdrSeq: support for a merge() method In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 23:03:02 GMT, William Kemper wrote: > A merge functionality on stats (distributions) was needed for the remembered set scan that I was using in some companion work. This PR implements a first cut at that, which is sufficient for our first (and only) use case. > > Unfortunately, for expediency, I am deferring work on decaying statistics, as a result of which users that want decaying statistics will get NaNs instead (or trigger guarantees). src/hotspot/share/utilities/numberSeq.cpp line 125: > 123: assert(abs2._alpha == _alpha, "Caution: merge incompatible?"); > 124: > 125: // Until JDK-... is fixed, we taint the decaying statistics JDK-... -> JDK-8298902 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13395#discussion_r1173946779 From cjplummer at openjdk.org Fri Apr 21 16:42:49 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 21 Apr 2023 16:42:49 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v3] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 08:09:55 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > corrections for BoundVirtualThread and test typos test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 89: > 87: log("\nMain #A: method A() must be blocked on entering a synchronized statement"); > 88: synchronized (TestTask.lock) { > 89: testTaskThread = Thread.ofVirtual().name("TestTaskThread").start(testTask); Do we have other tests that are doing the equivalent testing on platform threads? test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 135: > 133: // StopThread is expected to succeed. > 134: testTask.ensureFinished(); > 135: } I don't see how this is doing any testing. Where is the stopThread(null) call? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173970088 PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1173968926 From alanb at openjdk.org Fri Apr 21 17:15:47 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 21 Apr 2023 17:15:47 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> Message-ID: On Fri, 21 Apr 2023 06:06:16 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnv.cpp line 1197: >> >>> 1195: if (is_virtual && !is_JavaThread_current(java_thread, thread_oop)) { >>> 1196: if (!JvmtiVTSuspender::is_vthread_suspended(thread_oop)) { >>> 1197: return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >> >> Does JvmtiVTSuspender::is_vthread_suspended work for the alternative virtual thread implementation (-XX:-VMContinuations)? > > Nice catch - fixed now. Okay but JvmtiVTMSTransitionDisabler prevent the thread from being resumed for the -XX:-VMContinuations case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1174002110 From ysr at openjdk.org Fri Apr 21 18:03:44 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 21 Apr 2023 18:03:44 GMT Subject: RFR: 8305767: HdrSeq: support for a merge() method In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 23:03:02 GMT, William Kemper wrote: > A merge functionality on stats (distributions) was needed for the remembered set scan that I was using in some companion work. This PR implements a first cut at that, which is sufficient for our first (and only) use case. > > Unfortunately, for expediency, I am deferring work on decaying statistics, as a result of which users that want decaying statistics will get NaNs instead (or trigger guarantees). Those are both good questions: > I have a general comment about this. It looks to me that the new method is actually bulk-add-er? So it should be e.g.: > > ``` > class NumberSeq { > ... > public > virtual void add(NumberSeq& other) { ... } // adds all points from another number sequence > ``` > Note that the merge direction is reversed. From `this` into `other`. This was based on how it was being used and with `this` expected to be sparser than `other`. > Also, `clear_this` should probably be handled in a separate method (call). The idea was to do the clearing while merging and avoid the extra walk over the bucket if the two calls were split. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13395#issuecomment-1518162614 From sspitsyn at openjdk.org Fri Apr 21 18:36:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 21 Apr 2023 18:36:46 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v3] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 16:33:06 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> corrections for BoundVirtualThread and test typos > > test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 135: > >> 133: // StopThread is expected to succeed. >> 134: testTask.ensureFinished(); >> 135: } > > I don't see how this is doing any testing. Where is the stopThread(null) call? The target virtual thread just continues and invokes method `C()` which sends asynchronous exception with JVMTI `StopThread` to current thread: // This method uses StopThread to send an AssertionError object to // its own thread. It is expected to succeed. static void C() { log("TestTask.C: started"); StopThreadTest.stopThread(Thread.currentThread()); log("TestTask.C: finished"); } I'll try to add a comment to make it clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1174062854 From sspitsyn at openjdk.org Fri Apr 21 18:44:45 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 21 Apr 2023 18:44:45 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v3] In-Reply-To: References: Message-ID: <7nu2pq19SIw1me07wf4RnLEVErjYF3tnNnHZNcNWRlg=.8eb78564-8a56-4f59-a692-c7f79aef6ade@github.com> On Fri, 21 Apr 2023 16:34:34 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> corrections for BoundVirtualThread and test typos > > test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 89: > >> 87: log("\nMain #A: method A() must be blocked on entering a synchronized statement"); >> 88: synchronized (TestTask.lock) { >> 89: testTaskThread = Thread.ofVirtual().name("TestTaskThread").start(testTask); > > Do we have other tests that are doing the equivalent testing on platform threads? Do you ask this question to check if we want to extend this test to provide coverage for platform threads as well? The support of platform threads is much simpler. The JVMTI `StopThread` never returns the `THREAD_NOT_SUSPENDED` and `OPAQUE_FRAME` error codes for platform threads. So that this kind of testing is not needed for platform threads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1174068855 From cjplummer at openjdk.org Fri Apr 21 19:00:48 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 21 Apr 2023 19:00:48 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v3] In-Reply-To: <7nu2pq19SIw1me07wf4RnLEVErjYF3tnNnHZNcNWRlg=.8eb78564-8a56-4f59-a692-c7f79aef6ade@github.com> References: <7nu2pq19SIw1me07wf4RnLEVErjYF3tnNnHZNcNWRlg=.8eb78564-8a56-4f59-a692-c7f79aef6ade@github.com> Message-ID: On Fri, 21 Apr 2023 18:41:41 GMT, Serguei Spitsyn wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/StopThreadTest/StopThreadTest.java line 89: >> >>> 87: log("\nMain #A: method A() must be blocked on entering a synchronized statement"); >>> 88: synchronized (TestTask.lock) { >>> 89: testTaskThread = Thread.ofVirtual().name("TestTaskThread").start(testTask); >> >> Do we have other tests that are doing the equivalent testing on platform threads? > > Do you ask this question to check if we want to extend this test to provide coverage for platform threads as well? > > The support of platform threads is much simpler. The JVMTI `StopThread` never returns the `THREAD_NOT_SUSPENDED` and `OPAQUE_FRAME` error codes for platform threads. So that this kind of testing is not needed for platform threads. For the JDI tests I added, I execute them in both modes, with the appropriate adjustments to account for the errors we except for virtual threads. We should be testing to make sure that StopThread works with platform threads under a variety of situations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1174080418 From kvn at openjdk.org Fri Apr 21 19:26:48 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Apr 2023 19:26:48 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v9] In-Reply-To: <8AmU_ta4meiUmO99Em5bV7XLAV4H9fAcil519yh70fU=.1a28f4a9-a992-43a7-8c4a-d1cf96835963@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> <8AmU_ta4meiUmO99Em5bV7XLAV4H9fAcil519yh70fU=.1a28f4a9-a992-43a7-8c4a-d1cf96835963@github.com> Message-ID: On Fri, 21 Apr 2023 15:08:21 GMT, Cesar Soares Lucas wrote: > Since only the tests added in this PR are failing due to that problem do you think I should create a separate PR to fix the Regex or just include the fix in this PR? Create separate PR and fix it first. This PR still need review from @iwanowww and it may take time to address additional comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1518247727 From xliu at openjdk.org Fri Apr 21 22:02:50 2023 From: xliu at openjdk.org (Xin Liu) Date: Fri, 21 Apr 2023 22:02:50 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v5] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 14 Apr 2023 20:50:03 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/opto/escape.cpp line 457: >> >>> 455: found_sr_allocate = true; >>> 456: } else { >>> 457: ptn->set_scalar_replaceable(false); >> >> This member function is const. Do we really need to change ptn's property here? >> >> My reading is ophi is profitable as long as we spot any input object which can be eliminated. how about you just return at line 455? > > This is actually necessary here. By setting the input to NSR I don't need to later, when performing reduction, check that I can eliminate the node. I can just check that I can scalar replace the input. If I removed this line I'd hit a problem if the merge had an input that is SR but that ME can't eliminate. okay. I see you mean. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1174188560 From dcubed at openjdk.org Fri Apr 21 22:05:28 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 21 Apr 2023 22:05:28 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v60] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 20:00:58 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 156 commits: >> >> - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 >> - A few more LM_ prefixes in 32bit code >> - Replace UseHeavyMonitor with LockingMode == LM_MONITOR >> - Prefix LockingMode constants with LM_* >> - Bunch of comments and typos >> - Don't use NativeAccess in LockStack::contains() >> - RISCV update >> - Put back thread type check in OS::is_lock_owned() >> - Named constants for LockingMode >> - Address David's review comments >> - ... and 146 more: https://git.openjdk.org/jdk/compare/d2ce04bb...d0a448c6 > > Hi there, > what is needed to bring this PR over the approval line? @rkennke - I'm planning to do another crawl thru review next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1518372821 From sspitsyn at openjdk.org Fri Apr 21 23:56:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 21 Apr 2023 23:56:44 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> Message-ID: On Fri, 21 Apr 2023 17:12:32 GMT, Alan Bateman wrote: >> Nice catch - fixed now. > > Okay but JvmtiVTMSTransitionDisabler prevent the thread from being resumed for the -XX:-VMContinuations case? Thank you for the catch. Will check it. I have to extend the test to cover the BoundVirtualThread case enabled with the flag `-XX:-VMContinuations`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1174228621 From sspitsyn at openjdk.org Sat Apr 22 00:12:50 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 22 Apr 2023 00:12:50 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v3] In-Reply-To: References: <7nu2pq19SIw1me07wf4RnLEVErjYF3tnNnHZNcNWRlg=.8eb78564-8a56-4f59-a692-c7f79aef6ade@github.com> Message-ID: On Fri, 21 Apr 2023 18:58:22 GMT, Chris Plummer wrote: >> Do you ask this question to check if we want to extend this test to provide coverage for platform threads as well? >> >> The support of platform threads is much simpler. The JVMTI `StopThread` never returns the `THREAD_NOT_SUSPENDED` and `OPAQUE_FRAME` error codes for platform threads. So that this kind of testing is not needed for platform threads. > > For the JDI tests I added, I execute them in both modes, with the appropriate adjustments to account for the errors we except for virtual threads. We should be testing to make sure that StopThread works with platform threads under a variety of situations. Extending this test to cover platform threads does not look natural and is going to be a little ugly. But I can extend it to provide coverage for BoundVirtualThread case which is highjacking the platform thread implementation. Would it help? We should have pretty good coverage of the JVMTI `StopThread` for platform threads in `nsk.jvmti` test suite. It includes: - `stopthrd006` and `stopthrd007` - a number of `scenarios/capability/CM01 `tests ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1174232494 From vlivanov at openjdk.org Sat Apr 22 02:00:56 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 22 Apr 2023 02:00:56 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 20 Apr 2023 19:27:58 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - Merge master > - Add support for rematerializing scalar replaced objects participating in allocation merges Nice work, Cesar! I like how the patch shapes now. I'm not done with the review yet, but decided to share the comments I have so far. src/hotspot/share/code/debugInfo.cpp line 232: > 230: // If we call select again on the same merge we should return the same result > 231: if (_selected != nullptr) { > 232: return _selected; I'm not sure I understand how it is intended to work. The code below initializes `_selected`, but returns `nullptr` when `selector >= 0`. Subsequent calls will return non-null value. src/hotspot/share/code/debugInfo.cpp line 257: > 255: } else { > 256: assert(selector < _possible_objects.length(), "sanity"); > 257: _selected = (ObjectValue*) _possible_objects.at(selector); Any particular reason to reuse `ObjectValue` from `_possible_objects` instead of allocating a fresh one (as you do on `selector == -1` bracnh)? I'd prefer `ObjectMergeValue::select()` to always allocate a fresh `ObjectValue` when converting `ObjectMergeValue` + `ObjectMergeCandidateValue` into `ObjectValue`. src/hotspot/share/code/debugInfo.hpp line 199: > 197: // ObjectValue describing an object that was scalar replaced. > 198: > 199: class ObjectMergeValue: public ObjectValue { I find the decision to subclass`ObjectValue` confusing and error prone: now `is_object()` returns true for `ObjectMergeValue`, but you have to apply the selector first to turn it into `ObjectValue`. And now the order of checks matter, so you always have to perform `is_object_merge()` first and then follow it with `is_object()` guard. You have 3 flavors of `ObjectValue` now: * good old `ObjectValue`; * `ObjectMergeValue` * merge candidates (`ObjectMergeCandidateValue`?) Does it make sense to introduce 3 different subclasses under `ObjectValue` to clearly distinguish the scenarios? src/hotspot/share/code/debugInfo.hpp line 242: > 240: bool is_cached() const { return _cached; } > 241: void set_cached(bool cached) { _cached = cached; } > 242: AutoBoxObjectValue(int id, ScopeValue* klass, bool only_merge_candidate = false) : ObjectValue(id, klass, only_merge_candidate), _cached(false) { } Any particular reason to allow `AutoBoxObjectValue` to be a merge candidate? src/hotspot/share/opto/escape.hpp line 593: > 591: // Methods related to Reduce Allocation Merges > 592: > 593: bool can_reduce_this_phi(PhiNode* ophi) const; On naming: IMO referring to "this" doesn't help, but adds noise. If you drop it ("can_reduce_this_phi" => "can_reduce_phi"), it's still clear what the method does. src/java.base/share/classes/java/security/AccessController.java line 786: > 784: // allocation merge Phi leading to it) might become NonEscaping and get > 785: // scalar replaced. The call below enforces 'result' to always escape. > 786: ensureMaterializedForStackWalk(result); Why don't you add the same call in the other `executePrivileged` overload? It has the very same code shape. ------------- PR Review: https://git.openjdk.org/jdk/pull/12897#pullrequestreview-1396497913 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1174242946 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1174249820 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1174248472 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1174250881 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1174248735 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1174235850 From avoitylov at openjdk.org Sat Apr 22 07:01:50 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Sat, 22 Apr 2023 07:01:50 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit Message-ID: Provides missing implementation for arm32. Testing: hotspot/jtreg. ------------- Commit messages: - JDK-8305387 Changes: https://git.openjdk.org/jdk/pull/13596/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13596&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305387 Stats: 108 lines in 4 files changed: 95 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13596/head:pull/13596 PR: https://git.openjdk.org/jdk/pull/13596 From alanb at openjdk.org Sat Apr 22 08:03:44 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 22 Apr 2023 08:03:44 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> Message-ID: On Fri, 21 Apr 2023 23:53:45 GMT, Serguei Spitsyn wrote: >> Okay but JvmtiVTMSTransitionDisabler prevent the thread from being resumed for the -XX:-VMContinuations case? > > Thank you for the catch. Will check it. I have to extend the test to cover the BoundVirtualThread case enabled with the flag `-XX:-VMContinuations`. The scenario that I'm wondering about is where a virtual thread is resumed at around the same time that JVMTI StopThread is called. Not easy to test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1174328716 From alanb at openjdk.org Sat Apr 22 09:00:51 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 22 Apr 2023 09:00:51 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v10] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: <8qww81xiEqAhMY3DEwv7QL-gpWcrGq9QJnK4VuKqqhM=.54ddcbb8-eb51-467e-babe-1cba559c1b43@github.com> On Fri, 21 Apr 2023 13:21:09 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge branch 'master' into JDK-83050920-thread-sleep-subms > - Drop nanos_to_nanos_bounded > - Handle overflows > - More review comments > - Adjust test times > - Windows again > - Windows fixes: align(...) is only for power-of-two alignments > - Adjust assert > - Replace (park|sleep)_millis back with just (park|sleep) > - More review touchups > - ... and 13 more: https://git.openjdk.org/jdk/compare/5a00617b...a0c35f45 I don't have any other comments on this change, looks good. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13225#pullrequestreview-1396637701 From aph at openjdk.org Sat Apr 22 11:40:50 2023 From: aph at openjdk.org (Andrew Haley) Date: Sat, 22 Apr 2023 11:40:50 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit In-Reply-To: References: Message-ID: On Sat, 22 Apr 2023 06:54:59 GMT, Aleksei Voitylov wrote: > Provides missing implementation for arm32. > > Testing: hotspot/jtreg. Looks fine, except for one small not. src/hotspot/cpu/arm/templateTable_arm.cpp line 2639: > 2637: // Update registers with resolved info > 2638: __ load_resolved_indy_entry(cache, index); > 2639: // Load-acquire the adapter method to match store-release in ResolvedIndyEntry::fill_in() I'm confused by this comment. `MacroAssembler::LoadLoad` is not an acquire barrier, although I guess it must be one, because `DMB LD` will be emitted. To match the release in `ResolvedIndyEntry::fill_in()` this should be `LoadLoad|LoadStore`. ------------- PR Review: https://git.openjdk.org/jdk/pull/13596#pullrequestreview-1396708437 PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1174393011 From duke at openjdk.org Sun Apr 23 15:23:51 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sun, 23 Apr 2023 15:23:51 GMT Subject: Integrated: 8305590: Remove nothrow exception specifications from operator new In-Reply-To: References: Message-ID: <0RuAzlA_w6Fo1ozWGLeyZxx-rWonpl0ioRtILLXTZYI=.918fbcfb-d617-4e72-89c7-3b69284b81df@github.com> On Mon, 17 Apr 2023 17:09:44 GMT, Afshin Zafari wrote: > - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. > > - The `-fcheck-new` is removed from the gcc compile flags. > > - The `operator new` and `operator delete` are deleted from `StackObj`. > > - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. > - The `Thread::operator new`with and without `null` return are removed. > > ### Tests > local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 > mach5: tiers 1-5 This pull request has now been integrated. Changeset: 0f51e632 Author: Afshin Zafari Committer: Jesper Wilhelmsson URL: https://git.openjdk.org/jdk/commit/0f51e6326373ff7d4a4d9a0e3a2788401f73405d Stats: 53 lines in 8 files changed: 3 ins; 27 del; 23 mod 8305590: Remove nothrow exception specifications from operator new Reviewed-by: coleenp, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/13498 From jwaters at openjdk.org Sun Apr 23 18:34:54 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 23 Apr 2023 18:34:54 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v3] In-Reply-To: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> References: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> Message-ID: <7EfFlEYBncjzdoZzU_gdl8tB7kpj_z3ga_h-fXxy0Os=.42af4f0f-e003-4a54-bf79-603c80bdd12f@github.com> On Thu, 20 Apr 2023 08:41:58 GMT, Afshin Zafari wrote: >> - The `throw()` (i.e., no throw) specifications are removed from the instances of `operator new` where _do not_ return `nullptr`. >> >> - The `-fcheck-new` is removed from the gcc compile flags. >> >> - The `operator new` and `operator delete` are deleted from `StackObj`. >> >> - The `GrowableArrayCHeap::operator delete` is added to be matched with its corresponding allocation`AnyObj::operator new`, because gcc complains on that after removing the `-fcheck-new` flag. >> - The `Thread::operator new`with and without `null` return are removed. >> >> ### Tests >> local: linux-x64 gtest:GrowableArrayCHeap, macosx-aarch64 hotspot:tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8305590: Remove nothrow exception specifications from operator new I believe this may have missed removing the exception specifier from an operator new inside AnyObj, allocation.cpp, since gcc 12 and up on my end now refuses to compile HotSpot with this change. I'll create a cleanup change for this, if there isn't any opposition to that ------------- PR Comment: https://git.openjdk.org/jdk/pull/13498#issuecomment-1519127682 From kbarrett at openjdk.org Mon Apr 24 00:25:52 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Apr 2023 00:25:52 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v3] In-Reply-To: <7EfFlEYBncjzdoZzU_gdl8tB7kpj_z3ga_h-fXxy0Os=.42af4f0f-e003-4a54-bf79-603c80bdd12f@github.com> References: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> <7EfFlEYBncjzdoZzU_gdl8tB7kpj_z3ga_h-fXxy0Os=.42af4f0f-e003-4a54-bf79-603c80bdd12f@github.com> Message-ID: On Sun, 23 Apr 2023 18:31:57 GMT, Julian Waters wrote: > I believe this may have missed removing the exception specifier from an operator new inside AnyObj, allocation.cpp, since gcc 12 and up on my end now refuses to compile HotSpot with this change. I'll create a cleanup change for this, if there isn't any opposition to that It builds for me, with gcc12.2. However, it does look like `AnyObj::operator new(size_t, MEMFLAGS)` should have had the nothrow exception spec removed (both in the header and the .cpp) but didn't. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13498#issuecomment-1519208242 From jvernee at openjdk.org Mon Apr 24 06:55:51 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 24 Apr 2023 06:55:51 GMT Subject: RFR: 8304986: Upcall stubs should support capureCallState Message-ID: Implement captureCallState support for upcall stubs. The method handle of an upcall stub linked with this linker option has an additional leading memory segment parameter into which the capture state (e.g. errno) should be written. After returning from Java, this value is then actually written to the corresponding execution state. ------------- Depends on: https://git.openjdk.org/jdk/pull/13079 Commit messages: - Use orElseGet - remove redundant variable - update stubs - polish naming and comments - Fix aarch64 impl - simplify names - update linker doc - implement upcall ccs in fallback linker - working upcall CCS - refactor TestCaptureCallState - ... and 1 more: https://git.openjdk.org/jdk/compare/91f43d13...96f3d1f3 Changes: https://git.openjdk.org/jdk/pull/13588/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13588&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304986 Stats: 523 lines in 36 files changed: 331 ins; 23 del; 169 mod Patch: https://git.openjdk.org/jdk/pull/13588.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13588/head:pull/13588 PR: https://git.openjdk.org/jdk/pull/13588 From jvernee at openjdk.org Mon Apr 24 06:55:54 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 24 Apr 2023 06:55:54 GMT Subject: RFR: 8304986: Upcall stubs should support capureCallState In-Reply-To: References: Message-ID: <-xoJpr6nJOQJthIQwP_eswGESCA8OPEKXfIcazlL3I4=.73c833a9-bfce-4fc6-a48d-589c0b6f3f16@github.com> On Fri, 21 Apr 2023 18:25:32 GMT, Jorn Vernee wrote: > Implement captureCallState support for upcall stubs. > > The method handle of an upcall stub linked with this linker option has an additional leading memory segment parameter into which the capture state (e.g. errno) should be written. After returning from Java, this value is then actually written to the corresponding execution state. src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line 177: > 175: // should appear before the IMR segment parameter > 176: target = swapArguments(target, 0, 1); > 177: } e.g. target = (MemorySegment, MemorySegment, ...) MemorySegment ^ IMR segment ^ CCS segment These should be swapped, since the binding recipe for the capture state segment comes before the binding recipe for the IMR segment, since the former is inserted as a leading parameter in CallingSequenceBuilder::build, after a particular CallArranger inserts the IMR segment parameter. src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line 192: > 190: public static UpcallStubFactory arrangeUpcallHelper(boolean isInMemoryReturn, boolean dropReturn, > 191: ABIDescriptor abi, CallingSequence callingSequence) { > 192: MethodType targetType = callingSequence.calleeMethodType(); Previously this was using the a method type which was derived from the FunctionDescriptor. Since that corresponds to the native side, it didn't contain the return buffer. But, since the return buffer isn't passed on to the user, that worked out since the type would match the type of the target method handle. However, if we are capturing call state, we need to pass on the capture state segment to the user, so the target type has an additional leading MS parameter. To make that work, I've switched to using the calleeMethodType here, which contains both the return buffer and capture state parameters, and then I filter out the return buffer below. This also avoids having to do the fake adaptation, which turned a method type returning a struct into a method type which excepts a pointer into which the return value should be written: `(...) -> MemorySegment` into `(MemorySegment, ...) -> void/MemorySegment)`. The callee method type is already in the latter form. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13588#discussion_r1174070167 PR Review Comment: https://git.openjdk.org/jdk/pull/13588#discussion_r1174060551 From duke at openjdk.org Mon Apr 24 06:55:56 2023 From: duke at openjdk.org (ExE Boss) Date: Mon, 24 Apr 2023 06:55:56 GMT Subject: RFR: 8304986: Upcall stubs should support capureCallState In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 18:25:32 GMT, Jorn Vernee wrote: > Implement captureCallState support for upcall stubs. > > The method handle of an upcall stub linked with this linker option has an additional leading memory segment parameter into which the capture state (e.g. errno) should be written. After returning from Java, this value is then actually written to the corresponding execution state. test/jdk/java/foreign/capturecallstate/TestCaptureCallState.java line 89: > 87: FunctionDescriptor downcallDesc = testCase.retValueLayout() > 88: .map(rl -> FunctionDescriptor.of(rl, JAVA_INT, rl)) > 89: .orElse(FunctionDescriptor.ofVoid(JAVA_INT)); Using?[`Optional::orElseGet(?)`] avoids?allocating the?`FunctionDescriptor` corresponding?to?`(int)void` when?`testCase.retValueLayout()` is?present: Suggestion: .orElseGet(() -> FunctionDescriptor.ofVoid(JAVA_INT)); [`Optional::orElseGet(?)`]: https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/Optional.html#orElseGet(java.util.function.Supplier) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13588#discussion_r1174607193 From duke at openjdk.org Mon Apr 24 08:01:40 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Mon, 24 Apr 2023 08:01:40 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v2] In-Reply-To: References: Message-ID: > As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. > This is immensely useful for investigating time-to-safepoint issues in low latency space. Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: Fixed jlong conversion order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13373/files - new: https://git.openjdk.org/jdk/pull/13373/files/cc2ed889..a8fdc733 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13373/head:pull/13373 PR: https://git.openjdk.org/jdk/pull/13373 From duke at openjdk.org Mon Apr 24 08:01:44 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Mon, 24 Apr 2023 08:01:44 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v2] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 10:31:30 GMT, Martin Doerr wrote: > Changing the type of a product flag requires a CSR (https://wiki.openjdk.org/display/csr/CSR+FAQs). This is my first of two initial contributions, I don't have Author status yet. Could you please raise CSR on my behalf? > src/hotspot/share/runtime/safepoint.cpp line 382: > >> 380: // Set the limit time, so that it can be compared to see if this has taken >> 381: // too long to complete. >> 382: safepoint_limit_time = SafepointTracing::start_of_safepoint() + (jlong)SafepointTimeoutDelay * NANOSECS_PER_MILLISEC; > > I think it should be `(SafepointTimeoutDelay * NANOSECS_PER_MILLISEC)` before converting to jlong. Thank you @TheRealMDoerr for reviewing. I just committed a change with your suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13373#issuecomment-1519570097 PR Review Comment: https://git.openjdk.org/jdk/pull/13373#discussion_r1174916418 From kbarrett at openjdk.org Mon Apr 24 08:52:18 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Apr 2023 08:52:18 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path Message-ID: Please review this change to the string deduplication thread to make it a kind of JavaThread rather than a ConcurrentGCThread. There are several pieces to this change: (1) New class StringDedupThread (derived from JavaThread), separate from StringDedup::Processor (which is now just a CHeapObj instead of deriving from ConcurrentGCThread). The thread no longer needs to or supports being stopped, like other similar threads. It also needs to be started later, once Java threads are supported. Also don't need an explicit visitor, since it will be in the normal Java threads list. This separation made the changeover a little cleaner to develop, and made the servicability support a little cleaner too. (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, instead of using the SuspendibleThreadSet facility. (3) Because we're using ThreadBlockInVM, which has a different usage style from STS, the tracking of time spent by the processor blocked for safepoints doesn't really work. It's not very important anyway, since normal thread descheduling can also affect the normal processing times being gathered and reported. So we just drop the so-called "blocked" time and associated infrastructure, simplifying Stat tracking a bit. Also renamed the "concurrent" stat to be "active", since it's all in a JavaThread now. (4) To avoid #include problems, moved the definition of JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, where one of the functions it calls also is defined. (5) Added servicability support for the new thread. Testing: mach5 tier1-3 with -XX:+UseStringDeduplication. The test runtime/cds/DeterministicDump.java fails intermittently with that option, which is not surprising - see JDK-8306712. I was never able to reproduce the failure; it's likely quite timing sensitive. The fix of changing the type is based on StefanK's comment that ZResurrection doesn't expect a non-Java thread to perform load-barriers. ------------- Commit messages: - fix stray tab - move is_active_Java_thread - copyrights - servicabilty support - use JavaThread - separate thread class - simplify init - do not pass around STS joiner - remove no longer needed Phase enum - remove block_phase et al Changes: https://git.openjdk.org/jdk/pull/13607/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13607&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305566 Stats: 440 lines in 18 files changed: 193 ins; 146 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/13607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13607/head:pull/13607 PR: https://git.openjdk.org/jdk/pull/13607 From rrich at openjdk.org Mon Apr 24 08:55:50 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 24 Apr 2023 08:55:50 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: <220qeklIH1qHdlGlQtQ2qz3TJmCz_P5zMho1eaVQ0Ps=.4dd8a303-943f-4e80-884a-2e69e33ba881@github.com> On Fri, 21 Apr 2023 13:09:37 GMT, Fredrik Bredberg wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated after review I'll do another round of testing. Thanks, Richard. ------------- PR Review: https://git.openjdk.org/jdk/pull/13477#pullrequestreview-1397485199 From rrich at openjdk.org Mon Apr 24 08:55:54 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 24 Apr 2023 08:55:54 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 13:36:55 GMT, Fredrik Bredberg wrote: >> src/hotspot/cpu/ppc/continuationFreezeThaw_ppc.inline.hpp line 510: >> >>> 508: // we need to set the locals so that the caller of new_stack_frame() can call >>> 509: // ContinuationHelper::InterpretedFrame::frame_bottom >>> 510: // copy relativized locals from the heap frame >> >> Maybe reduce the comment? >> >> // we need to copy the locals so that the caller of new_stack_frame() can call >> // ContinuationHelper::InterpretedFrame::frame_bottom > > I think the comment is good as it is, because it describes just why we need to set the locals this early (and not the rest of the members in the fixed_frame). So the current version // we need to set the locals so that the caller of new_stack_frame() can call // ContinuationHelper::InterpretedFrame::frame_bottom // copy relativized locals from the heap frame says that the locals are set and then again that they are copied. I thought that could be combined to what I suggested. But I'm ok if you still prefer the current version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1174978882 From avoitylov at openjdk.org Mon Apr 24 09:06:44 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Mon, 24 Apr 2023 09:06:44 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v2] In-Reply-To: References: Message-ID: > Provides missing implementation for arm32. > > Testing: hotspot/jtreg. Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13596/files - new: https://git.openjdk.org/jdk/pull/13596/files/dec02089..4c19fdb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13596&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13596&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13596/head:pull/13596 PR: https://git.openjdk.org/jdk/pull/13596 From avoitylov at openjdk.org Mon Apr 24 09:10:01 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Mon, 24 Apr 2023 09:10:01 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v2] In-Reply-To: References: Message-ID: On Sat, 22 Apr 2023 11:38:04 GMT, Andrew Haley wrote: >> Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comments > > src/hotspot/cpu/arm/templateTable_arm.cpp line 2639: > >> 2637: // Update registers with resolved info >> 2638: __ load_resolved_indy_entry(cache, index); >> 2639: // Load-acquire the adapter method to match store-release in ResolvedIndyEntry::fill_in() > > I'm confused by this comment. `MacroAssembler::LoadLoad` is not an acquire barrier, although I guess it must be one, because `DMB LD` will be emitted. To match the release in `ResolvedIndyEntry::fill_in()` this should be `LoadLoad|LoadStore`. Fixed, thanks for catching this. Note that currently DMB SY is emitted either way, but it's important to keep semantic consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1174996181 From stefank at openjdk.org Mon Apr 24 09:10:45 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 24 Apr 2023 09:10:45 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 08:24:53 GMT, Kim Barrett wrote: > Please review this change to the string deduplication thread to make it a kind > of JavaThread rather than a ConcurrentGCThread. There are several pieces to > this change: > > (1) New class StringDedupThread (derived from JavaThread), separate from > StringDedup::Processor (which is now just a CHeapObj instead of deriving from > ConcurrentGCThread). The thread no longer needs to or supports being stopped, > like other similar threads. It also needs to be started later, once Java > threads are supported. Also don't need an explicit visitor, since it will be > in the normal Java threads list. This separation made the changeover a little > cleaner to develop, and made the servicability support a little cleaner too. > > (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, > instead of using the SuspendibleThreadSet facility. > > (3) Because we're using ThreadBlockInVM, which has a different usage style > from STS, the tracking of time spent by the processor blocked for safepoints > doesn't really work. It's not very important anyway, since normal thread > descheduling can also affect the normal processing times being gathered and > reported. So we just drop the so-called "blocked" time and associated > infrastructure, simplifying Stat tracking a bit. Also renamed the > "concurrent" stat to be "active", since it's all in a JavaThread now. > > (4) To avoid #include problems, moved the definition of > JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, > where one of the functions it calls also is defined. > > (5) Added servicability support for the new thread. > > Testing: > mach5 tier1-3 with -XX:+UseStringDeduplication. > The test runtime/cds/DeterministicDump.java fails intermittently with that > option, which is not surprising - see JDK-8306712. > > I was never able to reproduce the failure; it's likely quite timing sensitive. > The fix of changing the type is based on StefanK's comment that ZResurrection > doesn't expect a non-Java thread to perform load-barriers. I think this looks sensible to me, though I'm not very familiar with the current StringDedup code, so consider this a partial review only. src/hotspot/share/gc/shared/stringdedup/stringDedupProcessor.hpp line 29: > 27: > 28: #include "memory/allocation.hpp" > 29: #include "gc/shared/stringdedup/stringDedup.hpp" sort order ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13607#pullrequestreview-1397500893 PR Review Comment: https://git.openjdk.org/jdk/pull/13607#discussion_r1174989267 From kbarrett at openjdk.org Mon Apr 24 09:25:02 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Apr 2023 09:25:02 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path [v2] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 09:00:53 GMT, Stefan Karlsson wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> fix include order > > src/hotspot/share/gc/shared/stringdedup/stringDedupProcessor.hpp line 29: > >> 27: >> 28: #include "memory/allocation.hpp" >> 29: #include "gc/shared/stringdedup/stringDedup.hpp" > > sort order oops. fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13607#discussion_r1175009838 From kbarrett at openjdk.org Mon Apr 24 09:25:01 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Apr 2023 09:25:01 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path [v2] In-Reply-To: References: Message-ID: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> > Please review this change to the string deduplication thread to make it a kind > of JavaThread rather than a ConcurrentGCThread. There are several pieces to > this change: > > (1) New class StringDedupThread (derived from JavaThread), separate from > StringDedup::Processor (which is now just a CHeapObj instead of deriving from > ConcurrentGCThread). The thread no longer needs to or supports being stopped, > like other similar threads. It also needs to be started later, once Java > threads are supported. Also don't need an explicit visitor, since it will be > in the normal Java threads list. This separation made the changeover a little > cleaner to develop, and made the servicability support a little cleaner too. > > (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, > instead of using the SuspendibleThreadSet facility. > > (3) Because we're using ThreadBlockInVM, which has a different usage style > from STS, the tracking of time spent by the processor blocked for safepoints > doesn't really work. It's not very important anyway, since normal thread > descheduling can also affect the normal processing times being gathered and > reported. So we just drop the so-called "blocked" time and associated > infrastructure, simplifying Stat tracking a bit. Also renamed the > "concurrent" stat to be "active", since it's all in a JavaThread now. > > (4) To avoid #include problems, moved the definition of > JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, > where one of the functions it calls also is defined. > > (5) Added servicability support for the new thread. > > Testing: > mach5 tier1-3 with -XX:+UseStringDeduplication. > The test runtime/cds/DeterministicDump.java fails intermittently with that > option, which is not surprising - see JDK-8306712. > > I was never able to reproduce the failure; it's likely quite timing sensitive. > The fix of changing the type is based on StefanK's comment that ZResurrection > doesn't expect a non-Java thread to perform load-barriers. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: fix include order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13607/files - new: https://git.openjdk.org/jdk/pull/13607/files/d4e94b89..f17cc6be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13607&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13607&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13607/head:pull/13607 PR: https://git.openjdk.org/jdk/pull/13607 From shade at openjdk.org Mon Apr 24 09:25:31 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Apr 2023 09:25:31 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path In-Reply-To: References: Message-ID: <55jKHzOyDChs03tfesgftPEzppeIIaJhS3AZ8uxNrXw=.3c35cfd0-01fa-49fd-8e4a-baec119edc9c@github.com> On Mon, 24 Apr 2023 08:24:53 GMT, Kim Barrett wrote: > Please review this change to the string deduplication thread to make it a kind > of JavaThread rather than a ConcurrentGCThread. There are several pieces to > this change: > > (1) New class StringDedupThread (derived from JavaThread), separate from > StringDedup::Processor (which is now just a CHeapObj instead of deriving from > ConcurrentGCThread). The thread no longer needs to or supports being stopped, > like other similar threads. It also needs to be started later, once Java > threads are supported. Also don't need an explicit visitor, since it will be > in the normal Java threads list. This separation made the changeover a little > cleaner to develop, and made the servicability support a little cleaner too. > > (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, > instead of using the SuspendibleThreadSet facility. > > (3) Because we're using ThreadBlockInVM, which has a different usage style > from STS, the tracking of time spent by the processor blocked for safepoints > doesn't really work. It's not very important anyway, since normal thread > descheduling can also affect the normal processing times being gathered and > reported. So we just drop the so-called "blocked" time and associated > infrastructure, simplifying Stat tracking a bit. Also renamed the > "concurrent" stat to be "active", since it's all in a JavaThread now. > > (4) To avoid #include problems, moved the definition of > JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, > where one of the functions it calls also is defined. > > (5) Added servicability support for the new thread. > > Testing: > mach5 tier1-3 with -XX:+UseStringDeduplication. > The test runtime/cds/DeterministicDump.java fails intermittently with that > option, which is not surprising - see JDK-8306712. > > I was never able to reproduce the failure; it's likely quite timing sensitive. > The fix of changing the type is based on StefanK's comment that ZResurrection > doesn't expect a non-Java thread to perform load-barriers. Could we please change the synopsis to something more relevant? This PR does not only fix the ZGC test failure, but does a more fundamental change: switching string dedup thread from being `ConcurrentGCThread` to `JavaThread`, and so it affects more than one GC. The synopsis should reflect that, I think. (This would also be cleaner for potential backports, if any). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13607#issuecomment-1519707606 From shade at openjdk.org Mon Apr 24 09:41:43 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Apr 2023 09:41:43 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path [v2] In-Reply-To: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> References: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> Message-ID: On Mon, 24 Apr 2023 09:25:01 GMT, Kim Barrett wrote: >> Please review this change to the string deduplication thread to make it a kind >> of JavaThread rather than a ConcurrentGCThread. There are several pieces to >> this change: >> >> (1) New class StringDedupThread (derived from JavaThread), separate from >> StringDedup::Processor (which is now just a CHeapObj instead of deriving from >> ConcurrentGCThread). The thread no longer needs to or supports being stopped, >> like other similar threads. It also needs to be started later, once Java >> threads are supported. Also don't need an explicit visitor, since it will be >> in the normal Java threads list. This separation made the changeover a little >> cleaner to develop, and made the servicability support a little cleaner too. >> >> (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, >> instead of using the SuspendibleThreadSet facility. >> >> (3) Because we're using ThreadBlockInVM, which has a different usage style >> from STS, the tracking of time spent by the processor blocked for safepoints >> doesn't really work. It's not very important anyway, since normal thread >> descheduling can also affect the normal processing times being gathered and >> reported. So we just drop the so-called "blocked" time and associated >> infrastructure, simplifying Stat tracking a bit. Also renamed the >> "concurrent" stat to be "active", since it's all in a JavaThread now. >> >> (4) To avoid #include problems, moved the definition of >> JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, >> where one of the functions it calls also is defined. >> >> (5) Added servicability support for the new thread. >> >> Testing: >> mach5 tier1-3 with -XX:+UseStringDeduplication. >> The test runtime/cds/DeterministicDump.java fails intermittently with that >> option, which is not surprising - see JDK-8306712. >> >> I was never able to reproduce the failure; it's likely quite timing sensitive. >> The fix of changing the type is based on StefanK's comment that ZResurrection >> doesn't expect a non-Java thread to perform load-barriers. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > fix include order I like this simplification a lot, thanks! I am running some Shenandoah string-dedup tests now. (Formally requesting the change of synopsis) ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13607#pullrequestreview-1397574534 Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13607#pullrequestreview-1397576017 From duke at openjdk.org Mon Apr 24 09:42:47 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Mon, 24 Apr 2023 09:42:47 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 10:19:25 GMT, Richard Reingruber wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated after review > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1093: > >> 1091: >> 1092: // Some architectures (like AArch64/PPC64/RISC-V) adds padding between the locals and the fixed_frame to keep the fp 16-byte-aligned. >> 1093: // On those architectures we freeze the padding in order to keep the same localized pointer values. > > Suggestion: > > // On those architectures we freeze the padding in order to keep the same relative references. Tried to clarify by changing the comment to: `// On those architectures we freeze the padding in order to keep the same fp-relative offsets in the fixed_frame.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1175034185 From jbechberger at openjdk.org Mon Apr 24 09:52:05 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 24 Apr 2023 09:52:05 GMT Subject: RFR: 8297967: Make frame::safe_for_sender safer [v6] In-Reply-To: References: Message-ID: > Makes `frame::safe_for_sender` safer by checking that the location of the return address, sender stack pointer, and link address is accessible. This makes the method safer in the case of broken frames. Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Remove errorneously added check - Remove check for value that might be null - More SafeFetch - Make frame::safe_for_sender safer with SafeFetch ------------- Changes: https://git.openjdk.org/jdk/pull/11461/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11461&range=05 Stats: 31 lines in 4 files changed: 25 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11461/head:pull/11461 PR: https://git.openjdk.org/jdk/pull/11461 From jbechberger at openjdk.org Mon Apr 24 09:52:07 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 24 Apr 2023 09:52:07 GMT Subject: RFR: 8297967: Make frame::safe_for_sender safer [v2] In-Reply-To: References: <7kRWzMPPMa04KSKXQKj-V7VFYgbbWtjFiw9erejlU3o=.c0ba5205-96fe-450b-808f-f91beecb0632@github.com> Message-ID: On Tue, 6 Dec 2022 08:07:39 GMT, Johannes Bechberger wrote: >> Probably not, but not relevant to the comments about line 268. > > You're right. I'm going to look into it today more thoroughly. I removed the check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11461#discussion_r1175036415 From shade at openjdk.org Mon Apr 24 11:57:51 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Apr 2023 11:57:51 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path [v2] In-Reply-To: References: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> Message-ID: On Mon, 24 Apr 2023 09:39:07 GMT, Aleksey Shipilev wrote: > I like this simplification a lot, thanks! I am running some Shenandoah string-dedup tests now. Ran `gc/shenandoah/TestStringDedup*` on x86_64 and AArch64 for 100 times without a problem. These tests usually fail when there are bugs in string dedup. So I think we are clear there! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13607#issuecomment-1520010545 From aboldtch at openjdk.org Mon Apr 24 12:00:11 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 24 Apr 2023 12:00:11 GMT Subject: RFR: 8306732: TruncatedSeq::predict_next() attempts linear regression with only one data point Message-ID: TruncatedSeq::predict_next() attempts linear regression with only one data point, this leads to a division by zero. (There are infinit many linear functions that fit equally well for a single point). I suggest we do what we do for the zero points case, namely pick one of the linear functions. For zero points the current version picks `y = 0 + 0*x` and the suggestion is that for one point `P` the function `y = P_y + 0*x` is picked. Tested for ZGC tier1-7 on Oracle supported platforms. Only ZGC use TruncatedSeq::predict_next() ------------- Commit messages: - 8306732: TruncatedSeq::predict_next() attempts linear regression with only one data point Changes: https://git.openjdk.org/jdk/pull/13614/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13614&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306732 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13614.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13614/head:pull/13614 PR: https://git.openjdk.org/jdk/pull/13614 From jvernee at openjdk.org Mon Apr 24 13:20:05 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 24 Apr 2023 13:20:05 GMT Subject: RFR: 8304986: Upcall stubs should support capureCallState In-Reply-To: References: Message-ID: <_puDoD0ej-K0s8v-fmTR34AiXJWuRKW37Il4A_mWyAU=.ee7f2671-6e3b-4621-82d2-0b265013f81d@github.com> On Fri, 21 Apr 2023 18:25:32 GMT, Jorn Vernee wrote: > Implement captureCallState support for upcall stubs. > > The method handle of an upcall stub linked with this linker option has an additional leading memory segment parameter into which the capture state (e.g. errno) should be written. After returning from Java, this value is then actually written to the corresponding execution state. There is a longer string to pull here: someone might want to receive the capture state as input to an upcall, or conversely, pass the capture state to a downcall. Right now this seems to be not really needed, so will revisit later ------------- PR Comment: https://git.openjdk.org/jdk/pull/13588#issuecomment-1520143832 From jvernee at openjdk.org Mon Apr 24 13:20:06 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 24 Apr 2023 13:20:06 GMT Subject: Withdrawn: 8304986: Upcall stubs should support capureCallState In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 18:25:32 GMT, Jorn Vernee wrote: > Implement captureCallState support for upcall stubs. > > The method handle of an upcall stub linked with this linker option has an additional leading memory segment parameter into which the capture state (e.g. errno) should be written. After returning from Java, this value is then actually written to the corresponding execution state. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13588 From jbechberger at openjdk.org Mon Apr 24 14:07:01 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 24 Apr 2023 14:07:01 GMT Subject: RFR: 8297967: Make frame::safe_for_sender safer [v6] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 09:52:05 GMT, Johannes Bechberger wrote: >> Makes `frame::safe_for_sender` safer by checking that the location of the return address, sender stack pointer, and link address is accessible. This makes the method safer in the case of broken frames. > > Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Remove errorneously added check > - Remove check for value that might be null > - More SafeFetch > - Make frame::safe_for_sender safer with SafeFetch The issue happens reproducibly while fuzzing profiling APIs. The change would remove a large class of errors that I found during testing. The issues can happen in theory too, due to e.g. a modified frame passed in by the profiler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11461#issuecomment-1520223049 From mdoerr at openjdk.org Mon Apr 24 14:02:59 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Apr 2023 14:02:59 GMT Subject: RFR: 8297967: Make frame::safe_for_sender safer [v6] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 09:52:05 GMT, Johannes Bechberger wrote: >> Makes `frame::safe_for_sender` safer by checking that the location of the return address, sender stack pointer, and link address is accessible. This makes the method safer in the case of broken frames. > > Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Remove errorneously added check > - Remove check for value that might be null > - More SafeFetch > - Make frame::safe_for_sender safer with SafeFetch I'm still trying to understand the underlying problem. I guess that FP points into a read protected (or uncommitted) part of the stack which isn't caught by the `fp_safe` checks for some reason. Using 3 safefetch checks is a bit much, because we shouldn't have gaps within a frame. On the other hand, checking exactly the 3 fields we need sounds like a good idea and your checks should be affordable from performance point of view. I'm ok with it, but I'd like to hear more opinions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11461#issuecomment-1520215175 From mdoerr at openjdk.org Mon Apr 24 14:17:59 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Apr 2023 14:17:59 GMT Subject: RFR: 8297967: Make frame::safe_for_sender safer [v6] In-Reply-To: References: Message-ID: <9w0NadlMKYNBKWp60Zy-XvK18JimF0RtBfXQ0jObsxM=.6e79a8a3-73f7-4c37-9c1f-763e6414bc0a@github.com> On Mon, 24 Apr 2023 09:52:05 GMT, Johannes Bechberger wrote: >> Makes `frame::safe_for_sender` safer by checking that the location of the return address, sender stack pointer, and link address is accessible. This makes the method safer in the case of broken frames. > > Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Remove errorneously added check > - Remove check for value that might be null > - More SafeFetch > - Make frame::safe_for_sender safer with SafeFetch Can you dump the broken frame when one of your safefetch checks fail or create a hs_err file? That might shed more light into the problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11461#issuecomment-1520242536 From mdoerr at openjdk.org Mon Apr 24 14:23:44 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Apr 2023 14:23:44 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v2] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 07:57:46 GMT, Wojciech Kudla wrote: > > Changing the type of a product flag requires a CSR (https://wiki.openjdk.org/display/csr/CSR+FAQs). > > This is my first of two initial contributions, I don't have Author status yet. Could you please raise CSR on my behalf? Sure. Can you provide the content here? It should contain: -Summary -Problem -Solution -Specification (See https://bugs.openjdk.org/browse/JDK-8303500 for example.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13373#issuecomment-1520256706 From jbechberger at openjdk.org Mon Apr 24 14:24:00 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 24 Apr 2023 14:24:00 GMT Subject: RFR: 8297967: Make frame::safe_for_sender safer [v6] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 09:52:05 GMT, Johannes Bechberger wrote: >> Makes `frame::safe_for_sender` safer by checking that the location of the return address, sender stack pointer, and link address is accessible. This makes the method safer in the case of broken frames. > > Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Remove errorneously added check > - Remove check for value that might be null > - More SafeFetch > - Make frame::safe_for_sender safer with SafeFetch This happens when the frames frame and stack pointer are randomized (by adding a random number between 0 and 100000) during the fuzzing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11461#issuecomment-1520258991 From mdoerr at openjdk.org Mon Apr 24 14:28:44 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Apr 2023 14:28:44 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v2] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 08:01:40 GMT, Wojciech Kudla wrote: >> As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. >> This is immensely useful for investigating time-to-safepoint issues in low latency space. > > Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: > > Fixed jlong conversion order Would be helpful if you could enable the Pre-submit test (GitHub actions). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13373#issuecomment-1520269447 From cstein at openjdk.org Mon Apr 24 15:40:04 2023 From: cstein at openjdk.org (Christian Stein) Date: Mon, 24 Apr 2023 15:40:04 GMT Subject: Integrated: 8304896: Update to use jtreg 7.2 In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 14:56:16 GMT, Christian Stein wrote: > Please review the change to update to using jtreg 7.2. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. This pull request has now been integrated. Changeset: 2763cf14 Author: Christian Stein URL: https://git.openjdk.org/jdk/commit/2763cf14e6a174511ae8af471690ef18b10b3998 Stats: 9 lines in 8 files changed: 0 ins; 0 del; 9 mod 8304896: Update to use jtreg 7.2 Reviewed-by: erikj, lmesnik, iris ------------- PR: https://git.openjdk.org/jdk/pull/13496 From shade at openjdk.org Mon Apr 24 15:46:55 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Apr 2023 15:46:55 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v2] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 09:06:44 GMT, Aleksei Voitylov wrote: >> Provides missing implementation for arm32. >> >> Testing: hotspot/jtreg. > > Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: > > address review comments This looks okay, but I have a few questions. src/hotspot/cpu/arm/templateTable_arm.cpp line 2613: > 2611: // The rmethod register is input and overwritten to be the adapter method for the > 2612: // indy call. Link Register (lr) is set to the return address for the adapter and > 2613: // an appendix may be pushed to the stack. Registers r0-r3 are clobbered "Registers r0-r3 are clobbered". This is a copy-paste error, it seems. Which registers this method really clobbers? R1..R3? src/hotspot/cpu/arm/templateTable_arm.cpp line 2641: > 2639: // Load-acquire the adapter method to match store-release in ResolvedIndyEntry::fill_in() > 2640: __ ldr(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); > 2641: TemplateTable::volatile_barrier(MacroAssembler::Membar_mask_bits(MacroAssembler::LoadLoad | MacroAssembler::LoadStore), noreg, true); Why calling `TemplateTable::volatile_barrier`, if you can do `__ membar(...)`? ------------- PR Review: https://git.openjdk.org/jdk/pull/13596#pullrequestreview-1398312154 PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1175469389 PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1175473161 From never at openjdk.org Mon Apr 24 16:50:59 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 24 Apr 2023 16:50:59 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v5] In-Reply-To: References: Message-ID: > This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into tkr-zgc - Use reloc for guard location and read internal fields using HotSpot accessors - Merge branch 'master' into tkr-zgc - Remove access to extra data section from Java code - Handle concurrent unloading - Merge branch 'master' into tkr-zgc - Add missing declaration - Replace NULL with nullptr in new code - Merge branch 'master' into tkr-zgc - Review fixes - ... and 1 more: https://git.openjdk.org/jdk/compare/62acc882...c7bb4391 ------------- Changes: https://git.openjdk.org/jdk/pull/11996/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11996&range=04 Stats: 791 lines in 38 files changed: 467 ins; 143 del; 181 mod Patch: https://git.openjdk.org/jdk/pull/11996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11996/head:pull/11996 PR: https://git.openjdk.org/jdk/pull/11996 From never at openjdk.org Mon Apr 24 16:56:03 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 24 Apr 2023 16:56:03 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v5] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 16:50:59 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - Replace NULL with nullptr in new code > - Merge branch 'master' into tkr-zgc > - Review fixes > - ... and 1 more: https://git.openjdk.org/jdk/compare/62acc882...c7bb4391 I've pushed a final round of fixes. In particular the JVMCI Java code used to read fields in Klass using direct memory access but it must go through the C++ accessors to deal with concurrent unloading. I also removed the Java side access to the extra data section and replaced it with a new getExceptionSeen accessor which is the only case were we use extra data. I filed https://bugs.openjdk.org/browse/JDK-8306767 for the method data extra data repacking issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11996#issuecomment-1520515290 From avoitylov at openjdk.org Mon Apr 24 18:09:51 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Mon, 24 Apr 2023 18:09:51 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v2] In-Reply-To: References: Message-ID: <13qgImKsWoIdZ9rfUtnJswWLFopJhzou6C8CXUiNqnA=.c84bd9f0-d0c6-412c-91c1-2f5c33bac64e@github.com> On Mon, 24 Apr 2023 15:36:42 GMT, Aleksey Shipilev wrote: >> Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: >> >> address review comments > > src/hotspot/cpu/arm/templateTable_arm.cpp line 2613: > >> 2611: // The rmethod register is input and overwritten to be the adapter method for the >> 2612: // indy call. Link Register (lr) is set to the return address for the adapter and >> 2613: // an appendix may be pushed to the stack. Registers r0-r3 are clobbered > > "Registers r0-r3 are clobbered". This is a copy-paste error, it seems. Which registers this method really clobbers? R1..R3? Fixed comment, thanks! > src/hotspot/cpu/arm/templateTable_arm.cpp line 2641: > >> 2639: // Load-acquire the adapter method to match store-release in ResolvedIndyEntry::fill_in() >> 2640: __ ldr(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); >> 2641: TemplateTable::volatile_barrier(MacroAssembler::Membar_mask_bits(MacroAssembler::LoadLoad | MacroAssembler::LoadStore), noreg, true); > > Why calling `TemplateTable::volatile_barrier`, if you can do `__ membar(...)`? Well, volatile_barrier is essentially a shell around membar, so here I'm just following the style of the code around. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1175611079 PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1175615855 From avoitylov at openjdk.org Mon Apr 24 18:09:45 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Mon, 24 Apr 2023 18:09:45 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v3] In-Reply-To: References: Message-ID: > Provides missing implementation for arm32. > > Testing: hotspot/jtreg. Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: address Aleksey review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13596/files - new: https://git.openjdk.org/jdk/pull/13596/files/4c19fdb1..ebcaeb97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13596&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13596&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13596/head:pull/13596 PR: https://git.openjdk.org/jdk/pull/13596 From shade at openjdk.org Mon Apr 24 18:41:29 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Apr 2023 18:41:29 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v2] In-Reply-To: <13qgImKsWoIdZ9rfUtnJswWLFopJhzou6C8CXUiNqnA=.c84bd9f0-d0c6-412c-91c1-2f5c33bac64e@github.com> References: <13qgImKsWoIdZ9rfUtnJswWLFopJhzou6C8CXUiNqnA=.c84bd9f0-d0c6-412c-91c1-2f5c33bac64e@github.com> Message-ID: On Mon, 24 Apr 2023 17:55:00 GMT, Aleksei Voitylov wrote: >> src/hotspot/cpu/arm/templateTable_arm.cpp line 2641: >> >>> 2639: // Load-acquire the adapter method to match store-release in ResolvedIndyEntry::fill_in() >>> 2640: __ ldr(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); >>> 2641: TemplateTable::volatile_barrier(MacroAssembler::Membar_mask_bits(MacroAssembler::LoadLoad | MacroAssembler::LoadStore), noreg, true); >> >> Why calling `TemplateTable::volatile_barrier`, if you can do `__ membar(...)`? > > Well, volatile_barrier is essentially a shell around membar, so here I'm just following the style of the code around. Honestly, the use of `TemplateTable::volatile_barrier` in `get_cache_and_index_and_bytecode_at_bcp` looks like a fluke/leftover from AArch64 port. `__ membar` is used everywhere else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1175656214 From matsaave at openjdk.org Mon Apr 24 19:44:24 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 24 Apr 2023 19:44:24 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v3] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 18:09:45 GMT, Aleksei Voitylov wrote: >> Provides missing implementation for arm32. >> >> Testing: hotspot/jtreg. > > Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: > > address Aleksey review comments Thanks for taking care of this port! I just have one note about a comment src/hotspot/cpu/arm/templateTable_arm.cpp line 2665: > 2663: __ ldrb(index, Address(cache, in_bytes(ResolvedIndyEntry::result_type_offset()))); > 2664: // load return address > 2665: // Return address is loaded into link register(lr) and not pushed to the stack This comment is a leftover personal note about how ARM works compared to x86, which I was more familiar with at the time. I think this is intuitive and probably not necessary anymore. ------------- Marked as reviewed by matsaave (Committer). PR Review: https://git.openjdk.org/jdk/pull/13596#pullrequestreview-1398682068 PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1175710375 From rkennke at openjdk.org Mon Apr 24 19:45:59 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 24 Apr 2023 19:45:59 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v60] In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 20:00:58 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 156 commits: >> >> - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 >> - A few more LM_ prefixes in 32bit code >> - Replace UseHeavyMonitor with LockingMode == LM_MONITOR >> - Prefix LockingMode constants with LM_* >> - Bunch of comments and typos >> - Don't use NativeAccess in LockStack::contains() >> - RISCV update >> - Put back thread type check in OS::is_lock_owned() >> - Named constants for LockingMode >> - Address David's review comments >> - ... and 146 more: https://git.openjdk.org/jdk/compare/d2ce04bb...d0a448c6 > > Hi there, > what is needed to bring this PR over the approval line? > @rkennke - I'm planning to do another crawl thru review next week. Thanks! That is greatly appeciated! ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1520728296 From cslucas at openjdk.org Mon Apr 24 19:50:22 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 24 Apr 2023 19:50:22 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Sat, 22 Apr 2023 01:12:32 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - Merge master >> - Add support for rematerializing scalar replaced objects participating in allocation merges > > src/hotspot/share/code/debugInfo.cpp line 232: > >> 230: // If we call select again on the same merge we should return the same result >> 231: if (_selected != nullptr) { >> 232: return _selected; > > I'm not sure I understand how it is intended to work. The code below initializes `_selected`, but returns `nullptr` when `selector >= 0`. Subsequent calls will return non-null value. This can be improved. I'll fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1175715702 From dcubed at openjdk.org Mon Apr 24 20:11:56 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 24 Apr 2023 20:11:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 11:15:47 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary comments This project is currently baselined on jdk-21+19-1510. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1520761332 From cjplummer at openjdk.org Mon Apr 24 20:26:12 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 24 Apr 2023 20:26:12 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path [v2] In-Reply-To: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> References: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> Message-ID: On Mon, 24 Apr 2023 09:25:01 GMT, Kim Barrett wrote: >> Please review this change to the string deduplication thread to make it a kind >> of JavaThread rather than a ConcurrentGCThread. There are several pieces to >> this change: >> >> (1) New class StringDedupThread (derived from JavaThread), separate from >> StringDedup::Processor (which is now just a CHeapObj instead of deriving from >> ConcurrentGCThread). The thread no longer needs to or supports being stopped, >> like other similar threads. It also needs to be started later, once Java >> threads are supported. Also don't need an explicit visitor, since it will be >> in the normal Java threads list. This separation made the changeover a little >> cleaner to develop, and made the servicability support a little cleaner too. >> >> (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, >> instead of using the SuspendibleThreadSet facility. >> >> (3) Because we're using ThreadBlockInVM, which has a different usage style >> from STS, the tracking of time spent by the processor blocked for safepoints >> doesn't really work. It's not very important anyway, since normal thread >> descheduling can also affect the normal processing times being gathered and >> reported. So we just drop the so-called "blocked" time and associated >> infrastructure, simplifying Stat tracking a bit. Also renamed the >> "concurrent" stat to be "active", since it's all in a JavaThread now. >> >> (4) To avoid #include problems, moved the definition of >> JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, >> where one of the functions it calls also is defined. >> >> (5) Added servicability support for the new thread. >> >> Testing: >> mach5 tier1-3 with -XX:+UseStringDeduplication. >> The test runtime/cds/DeterministicDump.java fails intermittently with that >> option, which is not surprising - see JDK-8306712. >> >> I was never able to reproduce the failure; it's likely quite timing sensitive. >> The fix of changing the type is based on StefanK's comment that ZResurrection >> doesn't expect a non-Java thread to perform load-barriers. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > fix include order SA changes look good ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13607#pullrequestreview-1398739628 From cjplummer at openjdk.org Mon Apr 24 20:29:20 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 24 Apr 2023 20:29:20 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> Message-ID: On Sat, 22 Apr 2023 08:01:20 GMT, Alan Bateman wrote: >> Thank you for the catch. Will check it. I have to extend the test to cover the BoundVirtualThread case enabled with the flag `-XX:-VMContinuations`. > > The scenario that I'm wondering about is where a virtual thread is resumed at around the same time that JVMTI StopThread is called. Not easy to test. Seems we should have a stress test for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1175750223 From cslucas at openjdk.org Tue Apr 25 00:14:13 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 25 Apr 2023 00:14:13 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Sat, 22 Apr 2023 01:52:37 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - Merge master >> - Add support for rematerializing scalar replaced objects participating in allocation merges > > src/hotspot/share/code/debugInfo.cpp line 257: > >> 255: } else { >> 256: assert(selector < _possible_objects.length(), "sanity"); >> 257: _selected = (ObjectValue*) _possible_objects.at(selector); > > Any particular reason to reuse `ObjectValue` from `_possible_objects` instead of allocating a fresh one (as you do on `selector == -1` bracnh)? I'd prefer `ObjectMergeValue::select()` to always allocate a fresh `ObjectValue` when converting `ObjectMergeValue` + `ObjectMergeCandidateValue` into `ObjectValue`. @iwanowww - may I ask why always allocating a fresh object might be better than returning a pointer to a previous "selected" object? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1175897406 From cslucas at openjdk.org Tue Apr 25 00:38:23 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 25 Apr 2023 00:38:23 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Sat, 22 Apr 2023 01:42:41 GMT, Vladimir Ivanov wrote: > Does it make sense to introduce 3 different subclasses under ObjectValue to clearly distinguish the scenarios? I think that's a good idea. I'll give it a shot. Thanks. > src/java.base/share/classes/java/security/AccessController.java line 786: > >> 784: // allocation merge Phi leading to it) might become NonEscaping and get >> 785: // scalar replaced. The call below enforces 'result' to always escape. >> 786: ensureMaterializedForStackWalk(result); > > Why don't you add the same call in the other `executePrivileged` overload? It has the very same code shape. Totally missed that! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1175906046 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1175905602 From sspitsyn at openjdk.org Tue Apr 25 02:19:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Apr 2023 02:19:18 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> Message-ID: On Mon, 24 Apr 2023 20:26:31 GMT, Chris Plummer wrote: >> The scenario that I'm wondering about is where a virtual thread is resumed at around the same time that JVMTI StopThread is called. Not easy to test. > > Seems we should have a stress test for that. > The scenario that I'm wondering about is where a virtual thread is resumed > at around the same time that JVMTI StopThread is called. This kind of scenario is not typical. The debugger should keep virtual thread suspended when making a call to JVMTI StopThread. Normally, this kind of race should never happen with the JDWP agent. Chris, why do you think it is important to have a stress test for this? Also, do you have any testing scenario in mind? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1175944953 From cjplummer at openjdk.org Tue Apr 25 03:52:07 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 25 Apr 2023 03:52:07 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> Message-ID: <7mPRSjn63r8Of6T5JG-cc7wWRiQ8ZEY_LbY6aKqAg1s=.a7cd70c7-e431-45de-9bf8-84546e098674@github.com> On Tue, 25 Apr 2023 02:15:54 GMT, Serguei Spitsyn wrote: >> Seems we should have a stress test for that. > >> The scenario that I'm wondering about is where a virtual thread is resumed >> at around the same time that JVMTI StopThread is called. > > This kind of scenario is not typical. > The debugger should keep virtual thread suspended when making a call to JVMTI StopThread. > Normally, this kind of race should never happen with the JDWP agent. > Chris, why do you think it is important to have a stress test for this? > Also, do you have any testing scenario in mind? I guess if it is not something that would typically ever be done in realistic situations, then there is no need for a stress test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1175979665 From sspitsyn at openjdk.org Tue Apr 25 04:27:07 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Apr 2023 04:27:07 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v4] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: 1. Address review comments 2. Clear interrupt bit in the TestTaskThread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/d2cc010e..dbdb4edd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=02-03 Stats: 53 lines in 1 file changed: 31 ins; 7 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From sspitsyn at openjdk.org Tue Apr 25 04:36:09 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Apr 2023 04:36:09 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: <7mPRSjn63r8Of6T5JG-cc7wWRiQ8ZEY_LbY6aKqAg1s=.a7cd70c7-e431-45de-9bf8-84546e098674@github.com> References: <70mPVNX3n2TUacbWW0JDIfTNEACwzppdyY5PzYZxdRY=.749a3e99-be99-45ab-a688-e9c563dd5182@github.com> <7mPRSjn63r8Of6T5JG-cc7wWRiQ8ZEY_LbY6aKqAg1s=.a7cd70c7-e431-45de-9bf8-84546e098674@github.com> Message-ID: On Tue, 25 Apr 2023 03:49:09 GMT, Chris Plummer wrote: >>> The scenario that I'm wondering about is where a virtual thread is resumed >>> at around the same time that JVMTI StopThread is called. >> >> This kind of scenario is not typical. >> The debugger should keep virtual thread suspended when making a call to JVMTI StopThread. >> Normally, this kind of race should never happen with the JDWP agent. >> Chris, why do you think it is important to have a stress test for this? >> Also, do you have any testing scenario in mind? > > I guess if it is not something that would typically ever be done in realistic situations, then there is no need for a stress test. Okay, thanks. I've extended the test to cover BoundVirtualThread's as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1175995873 From sspitsyn at openjdk.org Tue Apr 25 04:36:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Apr 2023 04:36:11 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v3] In-Reply-To: References: <7nu2pq19SIw1me07wf4RnLEVErjYF3tnNnHZNcNWRlg=.8eb78564-8a56-4f59-a692-c7f79aef6ade@github.com> Message-ID: On Sat, 22 Apr 2023 00:09:27 GMT, Serguei Spitsyn wrote: >> For the JDI tests I added, I execute them in both modes, with the appropriate adjustments to account for the errors we except for virtual threads. We should be testing to make sure that StopThread works with platform threads under a variety of situations. > > Extending this test to cover platform threads does not look natural and is going to be a little ugly. > But I can extend it to provide coverage for BoundVirtualThread case > which is highjacking the platform thread implementation. > Would it help? > We should have pretty good coverage of the JVMTI `StopThread` for platform threads in `nsk.jvmti` test suite. > It includes: > - `stopthrd006` and `stopthrd007` > - a number of `scenarios/capability/CM01 `tests > > Also, this extension does not touch the code path of platform threads support. I've extended the test to cover platform threads as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1175995443 From shade at openjdk.org Tue Apr 25 09:38:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Apr 2023 09:38:16 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v3] In-Reply-To: References: Message-ID: <_CgRZQoEdW0UVliaR79DagjC2Dnxma_wgyP4bqG7Sy4=.f07fbbf2-0d0b-4c78-b1d6-5aec1956a36f@github.com> On Mon, 24 Apr 2023 18:09:45 GMT, Aleksei Voitylov wrote: >> Provides missing implementation for arm32. >> >> Testing: hotspot/jtreg. > > Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: > > address Aleksey review comments Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13596#pullrequestreview-1399511152 From avoitylov at openjdk.org Tue Apr 25 09:38:14 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 25 Apr 2023 09:38:14 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v4] In-Reply-To: References: Message-ID: <0t8Fp7BDksOpJK4c2fU93aMSvGS8SthjDFk1LM4BFi0=.4614b9e8-407f-4071-ba68-29c18f2b929a@github.com> > Provides missing implementation for arm32. > > Testing: hotspot/jtreg. Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: address Matias comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13596/files - new: https://git.openjdk.org/jdk/pull/13596/files/ebcaeb97..38d59135 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13596&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13596&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13596/head:pull/13596 PR: https://git.openjdk.org/jdk/pull/13596 From shade at openjdk.org Tue Apr 25 09:38:18 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Apr 2023 09:38:18 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v2] In-Reply-To: References: <13qgImKsWoIdZ9rfUtnJswWLFopJhzou6C8CXUiNqnA=.c84bd9f0-d0c6-412c-91c1-2f5c33bac64e@github.com> Message-ID: <4QLTcIu8nFLPADYkXAf1Em9nfIO089A0iRMJa3uv9iU=.58d4c75f-0a93-4477-b63e-0ff57b3f780b@github.com> On Mon, 24 Apr 2023 18:39:00 GMT, Aleksey Shipilev wrote: >> Well, volatile_barrier is essentially a shell around membar, so here I'm just following the style of the code around. > > Honestly, the use of `TemplateTable::volatile_barrier` in `get_cache_and_index_and_bytecode_at_bcp` looks like a fluke/leftover from AArch64 port. `__ membar` is used everywhere else. It would not stand in the way of integrating this PR, but do consider `__ membar`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1176256800 From dnsimon at openjdk.org Tue Apr 25 10:15:20 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 25 Apr 2023 10:15:20 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v5] In-Reply-To: References: Message-ID: <59vbUMEGBW0WeGqNNHnq1HnsJHJ5zfdlO82sxmmZACY=.78415fc5-d462-49ff-9967-1d33aa94ea95@github.com> On Mon, 24 Apr 2023 16:50:59 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - Replace NULL with nullptr in new code > - Merge branch 'master' into tkr-zgc > - Review fixes > - ... and 1 more: https://git.openjdk.org/jdk/compare/62acc882...c7bb4391 src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 891: > 889: * {@code base} is a {@link HotSpotConstantPool}, {@link HotSpotMethodData}, {@link HotSpotObjectConstantImpl}, > 890: * or {@link HotSpotResolvedObjectTypeImpl} then the field > 891: * corresopnding to {@code displacement} is fetched using the appropriate HotSpot accessor. Any corresopnding -> corresponding ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1176304437 From sspitsyn at openjdk.org Tue Apr 25 10:17:10 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Apr 2023 10:17:10 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v2] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 13:33:53 GMT, Alan Bateman wrote: >> What does "suspended at an event" mean? As a programmer trying to use this how am I supposed to know when it can be used without getting an error? >> >> I find it very surprising that the error would occur with an unmounted thread - having a VT throw when it was remounted seems the most natural way to implement this. > > I think "suspended at an event" is okay. It means the callback for an event has been triggered and the agent suspended the thread. The typical use-case for JVMTI StopThread is when at a breakpoint or when single stepping and the user asks the debugger to throw some exception so that the code's handling of the exception can been debugged/tested. Debugger and JDWP agent aside, I don't know if there are other agents using this JVMTI function. If there are other and they call this function on some random virtual thread at some random time then the function will fail. > > One other point around this is that the plan is to have StopThread, ForceEarlyReturn, PopFrame and SetLocalXXX work more consistently. Right now, SetLocalXXX minimally requires a virtual thread be suspended at a breakpoint or single step event. The minimum support can be broader to be suspended at any event. Alan, thank you for explaining this. Resolving this conversation. David, please, reopen it if it is no okay with you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1176306009 From tschatzl at openjdk.org Tue Apr 25 10:46:16 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Apr 2023 10:46:16 GMT Subject: RFR: 8306732: TruncatedSeq::predict_next() attempts linear regression with only one data point In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 11:51:06 GMT, Axel Boldt-Christmas wrote: > TruncatedSeq::predict_next() attempts linear regression with only one data point, this leads to a division by zero. (There are infinit many linear functions that fit equally well for a single point). > > I suggest we do what we do for the zero points case, namely pick one of the linear functions. > > For zero points the current version picks `y = 0 + 0*x` and the suggestion is that for one point `P` the function `y = P_y + 0*x` is picked. > > Tested for ZGC tier1-7 on Oracle supported platforms. Only ZGC use TruncatedSeq::predict_next() Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13614#pullrequestreview-1399637905 From shade at openjdk.org Tue Apr 25 11:07:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Apr 2023 11:07:16 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v11] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'master' into JDK-83050920-thread-sleep-subms - Fix Amazon copyright - Merge branch 'master' into JDK-83050920-thread-sleep-subms - Drop nanos_to_nanos_bounded - Handle overflows - More review comments - Adjust test times - Windows again - Windows fixes: align(...) is only for power-of-two alignments - Adjust assert - ... and 15 more: https://git.openjdk.org/jdk/compare/f968da97...e0a36cf7 ------------- Changes: https://git.openjdk.org/jdk/pull/13225/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=10 Stats: 254 lines in 11 files changed: 226 ins; 9 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From jbhateja at openjdk.org Tue Apr 25 12:22:20 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Apr 2023 12:22:20 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 13:46:12 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style src/hotspot/cpu/x86/x86.ad line 7953: > 7951: __ punpckldq($dst$$XMMRegister, $src$$XMMRegister); > 7952: } > 7953: __ psrldq($dst$$XMMRegister, $origin$$constant * type2aelembytes(bt)); Move it to a new macro assembly routine. src/hotspot/cpu/x86/x86.ad line 7962: > 7960: !VM_Version::supports_ssse3()); > 7961: match(Set dst (VectorSlice (Binary dst src) origin)); > 7962: effect(TEMP xtmp); Please also associate TEMP_DEF / TEMP with dst to avoid early source overwrite in case dst/src are allocated same register. src/hotspot/cpu/x86/x86.ad line 7970: > 7968: __ movdqu($xtmp$$XMMRegister, $src$$XMMRegister); > 7969: __ pslldq($xtmp$$XMMRegister, 16 - shift_count); > 7970: __ por($dst$$XMMRegister, $xtmp$$XMMRegister); Move to macro assembly routine. src/hotspot/cpu/x86/x86.ad line 8007: > 8005: } > 8006: __ vpsrldq($dst$$XMMRegister, $dst$$XMMRegister, shift_count, Assembler::AVX_128bit); > 8007: } Move to macro assembly routine. src/hotspot/cpu/x86/x86.ad line 8063: > 8061: (type2aelembytes(Matcher::vector_element_basic_type(n)) * n->in(2)->get_int()) % 4U != 0 && > 8062: (type2aelembytes(Matcher::vector_element_basic_type(n)) * n->in(2)->get_int() < 16 || > 8063: type2aelembytes(Matcher::vector_element_basic_type(n)) * n->in(2)->get_int() > 48)); Move these bulky predications to source_hpp section, like done at https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8786 src/hotspot/cpu/x86/x86.ad line 8082: > 8080: type2aelembytes(Matcher::vector_element_basic_type(n)) * n->in(2)->get_int() > 16 && > 8081: type2aelembytes(Matcher::vector_element_basic_type(n)) * n->in(2)->get_int() < 48); > 8082: match(Set dst (VectorSlice (Binary src1 src2) origin)); Same as above. src/hotspot/cpu/x86/x86.ad line 8099: > 8097: (Matcher::vector_length_in_bytes(n) == 64 || > 8098: (Matcher::vector_length_in_bytes(n) == 32 && > 8099: VM_Version::supports_avx512vl()))); Same as above. src/hotspot/share/opto/vectorIntrinsics.cpp line 1914: > 1912: if (vector_klass->const_oop() == NULL || elem_klass->const_oop() == NULL || > 1913: !vlen->is_con() || !origin_type->is_con()) { > 1914: if (C->print_intrinsics()) { Hi @merykitty , your inline expander is not handling non-constant origin case, this will introduce performance regressions w.r.t to existing implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176428922 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176424735 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176429190 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176429410 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176428080 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176428309 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176428542 PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176407424 From jbhateja at openjdk.org Tue Apr 25 12:22:22 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Apr 2023 12:22:22 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 11:57:21 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> style > > src/hotspot/share/opto/vectorIntrinsics.cpp line 1914: > >> 1912: if (vector_klass->const_oop() == NULL || elem_klass->const_oop() == NULL || >> 1913: !vlen->is_con() || !origin_type->is_con()) { >> 1914: if (C->print_intrinsics()) { > > Hi @merykitty , your inline expander is not handling non-constant origin case, this will introduce performance regressions w.r.t to existing implementation. You can extend expander to generate IR corresponding to fallback implementation to handle non-constant origin case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1176410139 From lucy at openjdk.org Tue Apr 25 13:00:14 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 25 Apr 2023 13:00:14 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> Message-ID: On Thu, 13 Apr 2023 16:32:19 GMT, Martin Doerr wrote: > What about using `stop` or anything which doesn't take a condition? That would avoid confusion. That is a good idea. It's of no benefit to check for the complement of the condition above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1176477573 From eosterlund at openjdk.org Tue Apr 25 13:05:18 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 25 Apr 2023 13:05:18 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v5] In-Reply-To: References: Message-ID: <3Ji_-G2uNn-piFj5M0v1q-LWQyr2sy0gt3F6lBztO9o=.7668f309-d3fe-42a2-8132-a1564ee7880b@github.com> On Mon, 24 Apr 2023 16:50:59 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - Replace NULL with nullptr in new code > - Merge branch 'master' into tkr-zgc > - Review fixes > - ... and 1 more: https://git.openjdk.org/jdk/compare/62acc882...c7bb4391 So do you have AArch64 nmethod entry barriers both for ZGC and the others implemented in Graal? Is it the same instruction sequence as we have? ------------- PR Comment: https://git.openjdk.org/jdk/pull/11996#issuecomment-1521754081 From eosterlund at openjdk.org Tue Apr 25 13:05:21 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 25 Apr 2023 13:05:21 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v2] In-Reply-To: References: Message-ID: <_DsEyNUfGQeIa0CRnoxWVgoYw0UY7sdOAs4gjmG8QIU=.7dc73529-864e-41c4-a7ee-d7044f4ab3c2@github.com> On Fri, 20 Jan 2023 21:16:12 GMT, Tom Rodriguez wrote: >> src/hotspot/share/jvmci/jvmciCodeInstaller.cpp line 732: >> >>> 730: >>> 731: if (UseZGC && _nmethod_entry_patch_offset == -1) { >>> 732: // ZGC requires the use of entry barriers for correctness >> >> Note that G1 also requires nmethod entry barriers for correctness. > > We've been happily running without them for a long time which is why we're just now getting around to supporting them. Are we sitting on a ticking time bomb? I do want to make them required as of 21 but maybe we could defer that to a future commit once my Graal changes to support the entry barriers has landed? Yeah unfortunately it is a ticking time bomb. Will you fix it for 21? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1176482629 From mdoerr at openjdk.org Tue Apr 25 13:27:20 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Apr 2023 13:27:20 GMT Subject: RFR: 8306823: Native memory leak in SharedRuntime::notify_jvmti_unmount/mount. Message-ID: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> The code introduced by [JDK-8304303](https://bugs.openjdk.org/browse/JDK-8304303) uses `JNIHandles::make_local` which requires `JNIHandles::destroy_local` which is currently missing. ------------- Commit messages: - 8306823: Native memory leak in SharedRuntime::notify_jvmti_unmount/mount. Changes: https://git.openjdk.org/jdk/pull/13641/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13641&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306823 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13641.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13641/head:pull/13641 PR: https://git.openjdk.org/jdk/pull/13641 From avoitylov at openjdk.org Tue Apr 25 14:02:14 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 25 Apr 2023 14:02:14 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v3] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 19:41:06 GMT, Matias Saavedra Silva wrote: >> Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: >> >> address Aleksey review comments > > src/hotspot/cpu/arm/templateTable_arm.cpp line 2665: > >> 2663: __ ldrb(index, Address(cache, in_bytes(ResolvedIndyEntry::result_type_offset()))); >> 2664: // load return address >> 2665: // Return address is loaded into link register(lr) and not pushed to the stack > > This comment is a leftover personal note about how ARM works compared to x86, which I was more familiar with at the time. I think this is intuitive and probably not necessary anymore. Comment is removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1176259956 From avoitylov at openjdk.org Tue Apr 25 14:02:11 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 25 Apr 2023 14:02:11 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v2] In-Reply-To: <4QLTcIu8nFLPADYkXAf1Em9nfIO089A0iRMJa3uv9iU=.58d4c75f-0a93-4477-b63e-0ff57b3f780b@github.com> References: <13qgImKsWoIdZ9rfUtnJswWLFopJhzou6C8CXUiNqnA=.c84bd9f0-d0c6-412c-91c1-2f5c33bac64e@github.com> <4QLTcIu8nFLPADYkXAf1Em9nfIO089A0iRMJa3uv9iU=.58d4c75f-0a93-4477-b63e-0ff57b3f780b@github.com> Message-ID: <2krHFbmAZS0-KhtRoEAuC6QzKbqlfwJG3Y3QvciNShE=.6a5fff7f-29d1-4658-858e-d1f60f04eb75@github.com> On Tue, 25 Apr 2023 09:31:07 GMT, Aleksey Shipilev wrote: >> Honestly, the use of `TemplateTable::volatile_barrier` in `get_cache_and_index_and_bytecode_at_bcp` looks like a fluke/leftover from AArch64 port. `__ membar` is used everywhere else. > > It would not stand in the way of integrating this PR, but do consider `__ membar`. volatile_barrier is widely used in templateTable_arm.cpp instead of plain membar (17 times vs 1 time). Let me file an enhancement to clean this up for the whole file. It does look artificial and specific to TemplateTable only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13596#discussion_r1176300103 From kbarrett at openjdk.org Tue Apr 25 14:43:13 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Apr 2023 14:43:13 GMT Subject: RFR: 8306732: TruncatedSeq::predict_next() attempts linear regression with only one data point In-Reply-To: References: Message-ID: <16mP5w0wYOJQ1nPM3oT9xRRHOXASrpVxCDlHeBUTGJs=.e9609b54-a60e-45a2-8feb-cbe3ac0efcf7@github.com> On Mon, 24 Apr 2023 11:51:06 GMT, Axel Boldt-Christmas wrote: > TruncatedSeq::predict_next() attempts linear regression with only one data point, this leads to a division by zero. (There are infinit many linear functions that fit equally well for a single point). > > I suggest we do what we do for the zero points case, namely pick one of the linear functions. > > For zero points the current version picks `y = 0 + 0*x` and the suggestion is that for one point `P` the function `y = P_y + 0*x` is picked. > > Tested for ZGC tier1-7 on Oracle supported platforms. Only ZGC use TruncatedSeq::predict_next() Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13614#pullrequestreview-1400081485 From pchilanomate at openjdk.org Tue Apr 25 15:08:15 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 25 Apr 2023 15:08:15 GMT Subject: RFR: 8306823: Native memory leak in SharedRuntime::notify_jvmti_unmount/mount. In-Reply-To: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> References: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> Message-ID: On Tue, 25 Apr 2023 13:19:36 GMT, Martin Doerr wrote: > The code introduced by [JDK-8304303](https://bugs.openjdk.org/browse/JDK-8304303) uses `JNIHandles::make_local` which requires `JNIHandles::destroy_local` which is currently missing. Looks good to me. Patricio ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13641#pullrequestreview-1400139322 From never at openjdk.org Tue Apr 25 15:18:12 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 25 Apr 2023 15:18:12 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v2] In-Reply-To: <_DsEyNUfGQeIa0CRnoxWVgoYw0UY7sdOAs4gjmG8QIU=.7dc73529-864e-41c4-a7ee-d7044f4ab3c2@github.com> References: <_DsEyNUfGQeIa0CRnoxWVgoYw0UY7sdOAs4gjmG8QIU=.7dc73529-864e-41c4-a7ee-d7044f4ab3c2@github.com> Message-ID: On Tue, 25 Apr 2023 13:00:48 GMT, Erik ?sterlund wrote: >> We've been happily running without them for a long time which is why we're just now getting around to supporting them. Are we sitting on a ticking time bomb? I do want to make them required as of 21 but maybe we could defer that to a future commit once my Graal changes to support the entry barriers has landed? > > Yeah unfortunately it is a ticking time bomb. Will you fix it for 21? Yes I'll fix this here and also update our labsjdk to require it now. Graal will now emit them for all GCs. This code was really there to deal with the transition period where we needed a working JDK but Graal didn't support it yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1176673831 From never at openjdk.org Tue Apr 25 15:22:33 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 25 Apr 2023 15:22:33 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v5] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 16:50:59 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - Replace NULL with nullptr in new code > - Merge branch 'master' into tkr-zgc > - Review fixes > - ... and 1 more: https://git.openjdk.org/jdk/compare/62acc882...c7bb4391 Yes we emit barriers exactly as HotSpot does for both aarch64 and amd64 though we don't support conc_instruction_and_data_patch yet because it's never enabled in the current code. I refactored NativeNMethodBarrier::check_barrier to verify the barrier at code install time so that we get exceptions if there's a mismatch between HotSpot and Graal, though the verification on the aarch64 side is very weak. It should probably at least check for the properly placed dmb. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11996#issuecomment-1521982961 From rrich at openjdk.org Tue Apr 25 15:23:35 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 25 Apr 2023 15:23:35 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: <220qeklIH1qHdlGlQtQ2qz3TJmCz_P5zMho1eaVQ0Ps=.4dd8a303-943f-4e80-884a-2e69e33ba881@github.com> References: <220qeklIH1qHdlGlQtQ2qz3TJmCz_P5zMho1eaVQ0Ps=.4dd8a303-943f-4e80-884a-2e69e33ba881@github.com> Message-ID: On Mon, 24 Apr 2023 08:52:48 GMT, Richard Reingruber wrote: > I'll do another round of testing. Thanks, Richard. Tests are clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13477#issuecomment-1521984550 From vkempik at openjdk.org Tue Apr 25 15:48:12 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 25 Apr 2023 15:48:12 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled Message-ID: Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: 169598 trp_lam 13562 trp_sam after the patch both numbers are zeroes. I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) ------------- Commit messages: - clean up comments - refix branch in interp - inline load_unsigned_short_at_bcp - Fix risc-v - merge - revert codebuffer change in favor of upstream changes - Fix issue with index being t1 in get_cache_index_at_bcp - Prevent misalligned memory access in code writer and Template Interp Changes: https://git.openjdk.org/jdk/pull/13645/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291550 Stats: 164 lines in 11 files changed: 104 ins; 0 del; 60 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From duke at openjdk.org Tue Apr 25 18:08:09 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Apr 2023 18:08:09 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive Message-ID: This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. ------------- Commit messages: - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive Changes: https://git.openjdk.org/jdk/pull/13652/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13652&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306460 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13652/head:pull/13652 PR: https://git.openjdk.org/jdk/pull/13652 From sspitsyn at openjdk.org Tue Apr 25 18:18:10 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Apr 2023 18:18:10 GMT Subject: RFR: 8306823: Native memory leak in SharedRuntime::notify_jvmti_unmount/mount. In-Reply-To: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> References: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> Message-ID: On Tue, 25 Apr 2023 13:19:36 GMT, Martin Doerr wrote: > The code introduced by [JDK-8304303](https://bugs.openjdk.org/browse/JDK-8304303) uses `JNIHandles::make_local` which requires `JNIHandles::destroy_local` which is currently missing. Looks good. Thank you for taking care about it. Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13641#pullrequestreview-1400466729 From sspitsyn at openjdk.org Tue Apr 25 18:27:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Apr 2023 18:27:18 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v5] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor tweak in new test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/dbdb4edd..956e8ee8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=03-04 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From vkempik at openjdk.org Tue Apr 25 19:00:10 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 25 Apr 2023 19:00:10 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 15:37:30 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress failure in lunux-x86_32 unrelated ( TestJNICriticalStressTest ) I have only changed code in src/hotspot/cpu/riscv/ and nothing outside of it ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1522268129 From mdoerr at openjdk.org Tue Apr 25 19:48:10 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Apr 2023 19:48:10 GMT Subject: RFR: 8306823: Native memory leak in SharedRuntime::notify_jvmti_unmount/mount. In-Reply-To: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> References: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> Message-ID: On Tue, 25 Apr 2023 13:19:36 GMT, Martin Doerr wrote: > The code introduced by [JDK-8304303](https://bugs.openjdk.org/browse/JDK-8304303) uses `JNIHandles::make_local` which requires `JNIHandles::destroy_local` which is currently missing. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13641#issuecomment-1522327187 From psandoz at openjdk.org Tue Apr 25 21:10:20 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 25 Apr 2023 21:10:20 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v24] In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 13:37:15 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> ### Specdiff >> https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html >> >> ### Javadoc >> https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html >> >> ### Tests >> >> Testing excludes tests on the "zero" platform. >> >> - [X] Tier1 >> - [X] Tier2 >> - [X] Tier3 >> - [X] Tier4 >> - [X] Tier5 >> - [X] Tier6 (Except one test applications/jcstress/init.java as per below) >> >> >> Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner >> at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) >> at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) >> Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner >> at java.base/java.lang.Class.forName0(Native Method) >> at java.base/java.lang.Class.forName(Class.java:497) >> at java.base/java.lang.Class.forName(Class.java:476) >> at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) >> ... 1 more > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: > > - Merge branch 'master' into PR_21_V2 > - Update test/jdk/java/foreign/TestByteBuffer.java > > Co-authored-by: Andrey Turbanov > - Merge pull request #3 from JornVernee/IsForeignLinkerSupported > > rename has_port > - rename has_port > - Merge pull request #2 from JornVernee/WSL_BB > > account for missing functional in WSL in TestByteBuffer > - account for missing mincore on WSL in TestByteBuffer > - Merge branch 'master' into PR_21_V2 > - 8305369: Issues in zero-length memory segment javadoc section > - 8305087: MemoryLayout API checks should be more eager > - Merge master > - ... and 32 more: https://git.openjdk.org/jdk/compare/9fb53adf...ba04f5cc src/java.base/share/classes/java/lang/foreign/Linker.java line 327: > 325: * MemorySegment allocateMemory(long byteSize, Arena arena) { > 326: * MemorySegment segment = (MemorySegment)malloc.invokeExact(byteSize); // size = 0, scope = always alive > 327: * return segment.reinterpret(size, arena, s -> free.invokeExact(s)); // size = byteSize, scope = arena.scope() Suggestion: * MemorySegment segment = (MemorySegment)malloc.invokeExact(byteSize); // size = 0, scope = always alive * return segment.reinterpret(byteSize, arena, s -> free.invokeExact(s)); // size = byteSize, scope = arena.scope() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1177057372 From vlivanov at openjdk.org Tue Apr 25 21:22:35 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Apr 2023 21:22:35 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <8OlLs3nmBxAKP_OcaZPhC3g1lNpfXJQ6zYCx2XNB43A=.055edcf3-96b7-4f97-9153-fdfe26bb0c0b@github.com> On Tue, 25 Apr 2023 00:10:53 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/code/debugInfo.cpp line 257: >> >>> 255: } else { >>> 256: assert(selector < _possible_objects.length(), "sanity"); >>> 257: _selected = (ObjectValue*) _possible_objects.at(selector); >> >> Any particular reason to reuse `ObjectValue` from `_possible_objects` instead of allocating a fresh one (as you do on `selector == -1` bracnh)? I'd prefer `ObjectMergeValue::select()` to always allocate a fresh `ObjectValue` when converting `ObjectMergeValue` + `ObjectMergeCandidateValue` into `ObjectValue`. > > @iwanowww - may I ask why always allocating a fresh object might be better than returning a pointer to a previous "selected" object? I don't mind there's caching happening if it gives any noticeable benefit. As of now, the code around doesn't care, probably, because it is allocated in resource arena. What I'm against is repurposing existing instances: don't modify a candidate object into a "real object", allocate a fresh one instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1177069013 From coleenp at openjdk.org Tue Apr 25 21:30:07 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Apr 2023 21:30:07 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 18:00:39 GMT, Ashutosh Mehra wrote: > This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. Looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13652#pullrequestreview-1400779082 From duke at openjdk.org Tue Apr 25 21:36:07 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Apr 2023 21:36:07 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 21:27:48 GMT, Coleen Phillimore wrote: >> This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. > > Looks good. @coleenp thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1522450361 From cslucas at openjdk.org Tue Apr 25 21:40:15 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 25 Apr 2023 21:40:15 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: <8OlLs3nmBxAKP_OcaZPhC3g1lNpfXJQ6zYCx2XNB43A=.055edcf3-96b7-4f97-9153-fdfe26bb0c0b@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <8OlLs3nmBxAKP_OcaZPhC3g1lNpfXJQ6zYCx2XNB43A=.055edcf3-96b7-4f97-9153-fdfe26bb0c0b@github.com> Message-ID: <5HmQ644vtSgf7NylFeEsQXfU8DC9W-zk3ayYq373LF4=.8fba7762-07b2-45b8-a002-1ad2c0c05b0e@github.com> On Tue, 25 Apr 2023 21:19:06 GMT, Vladimir Ivanov wrote: >> @iwanowww - may I ask why always allocating a fresh object might be better than returning a pointer to a previous "selected" object? > > I don't mind there's caching happening if it gives any noticeable benefit. As of now, the code around doesn't care, probably, because it is allocated in resource arena. > > What I'm against is repurposing existing instances: don't modify a candidate object into a "real object", allocate a fresh one instead. Thanks for clarifying. There is one scenario where turning the candidate into a "real object" simplify the implementation _greatly_. The scenario is when the ObjectValue is not just a candidate. I.e., the ObjectValue is also used independently of the merge. Example: Point p = new Point(); Point q = new Point(); if (cond) p = q; trap(p, q); Second issue, is that allocating a fresh ObjectValue will require copying the array of field values from the candidate object to the newly allocated object. That's not a big issue, just pointing that out, though. I propose that we allocate a fresh ObjectValue if the candidate is just a candidate (not used independent of merge) and if the candidate is not just a candidate we return the existing ObjectValue (turned 'real object'). I have that implemented, I can push it for you to take a look. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1177087887 From vlivanov at openjdk.org Tue Apr 25 21:51:06 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Apr 2023 21:51:06 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 18:00:39 GMT, Ashutosh Mehra wrote: > This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. There are other compiler-related flags (e.g., `JVM_ACC_NOT_C1_COMPILABLE`/`JVM_ACC_NOT_C2_COMPILABLE`/`JVM_ACC_NOT_C2_OSR_COMPILABLE` ) which should not be persisted into CDS archive. Also, I assume the following flags should be always unset in the archived version: `JVM_ACC_IS_PREFIXED_NATIVE`, `JVM_ACC_HAS_RESOLVED_METHOD`, `JVM_ACC_IS_OLD`, `JVM_ACC_IS_OBSOLETE`, `JVM_ACC_ON_STACK`, `JVM_ACC_IS_DELETED`, `JVM_ACC_IS_BEING_REDEFINED`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1522463626 From amenkov at openjdk.org Tue Apr 25 22:03:03 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 25 Apr 2023 22:03:03 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v8] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: mounted VTs reported as OTHER, unmounted VTs are not reported as roots ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/d95a8426..d149be41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=06-07 Stats: 13 lines in 1 file changed: 9 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From coleenp at openjdk.org Tue Apr 25 22:13:06 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Apr 2023 22:13:06 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 18:00:39 GMT, Ashutosh Mehra wrote: > This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. Yes, you're right, all these flags shouldn't be in the archive. I have a patch for JDK-8306851 which will make it easier to unset all of these flags (except has_loops/has_loops_init, which we want set in the archive). Maybe this change should wait. ------------- PR Review: https://git.openjdk.org/jdk/pull/13652#pullrequestreview-1400832595 From duke at openjdk.org Tue Apr 25 22:17:11 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Apr 2023 22:17:11 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 22:10:04 GMT, Coleen Phillimore wrote: >> This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. > > Yes, you're right, all these flags shouldn't be in the archive. I have a patch for JDK-8306851 which will make it easier to unset all of these flags (except has_loops/has_loops_init, which we want set in the archive). Maybe this change should wait. @coleenp that makes sense. Once your change is in, I can update this to clear all the flags that shouldn't be in the archived version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1522489550 From psandoz at openjdk.org Tue Apr 25 22:19:21 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 25 Apr 2023 22:19:21 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v24] In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 13:37:15 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> ### Specdiff >> https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html >> >> ### Javadoc >> https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html >> >> ### Tests >> >> Testing excludes tests on the "zero" platform. >> >> - [X] Tier1 >> - [X] Tier2 >> - [X] Tier3 >> - [X] Tier4 >> - [X] Tier5 >> - [X] Tier6 (Except one test applications/jcstress/init.java as per below) >> >> >> Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner >> at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) >> at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) >> Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner >> at java.base/java.lang.Class.forName0(Native Method) >> at java.base/java.lang.Class.forName(Class.java:497) >> at java.base/java.lang.Class.forName(Class.java:476) >> at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) >> ... 1 more > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: > > - Merge branch 'master' into PR_21_V2 > - Update test/jdk/java/foreign/TestByteBuffer.java > > Co-authored-by: Andrey Turbanov > - Merge pull request #3 from JornVernee/IsForeignLinkerSupported > > rename has_port > - rename has_port > - Merge pull request #2 from JornVernee/WSL_BB > > account for missing functional in WSL in TestByteBuffer > - account for missing mincore on WSL in TestByteBuffer > - Merge branch 'master' into PR_21_V2 > - 8305369: Issues in zero-length memory segment javadoc section > - 8305087: MemoryLayout API checks should be more eager > - Merge master > - ... and 32 more: https://git.openjdk.org/jdk/compare/9fb53adf...ba04f5cc src/java.base/share/classes/jdk/internal/foreign/SlicingAllocator.java line 53: > 51: @Override > 52: public MemorySegment allocate(long byteSize, long byteAlignment) { > 53: Utils.checkAllocationSizeAndAlign(byteSize, byteAlignment); `maxAlign` is no longer used ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1177117787 From amenkov at openjdk.org Tue Apr 25 22:30:41 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 25 Apr 2023 22:30:41 GMT Subject: RFR: 8306027: Clarify JVMTI heap functions spec about virtual thread stack. Message-ID: The fix updates JVMTI spec updates description of heap functions to support virtual threads. Virtual threads are not heap roots by design, so FollowReference/IterateOverReachableObjects specs are updated to note only platform threads. References from thread stacks (including virtual threads) are reported as JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL, so description of the values is relaxed. Also please review related CSR ------------- Commit messages: - spec update Changes: https://git.openjdk.org/jdk/pull/13661/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13661&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306027 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13661/head:pull/13661 PR: https://git.openjdk.org/jdk/pull/13661 From psandoz at openjdk.org Tue Apr 25 22:50:16 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 25 Apr 2023 22:50:16 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v24] In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 13:37:15 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> ### Specdiff >> https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html >> >> ### Javadoc >> https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html >> >> ### Tests >> >> Testing excludes tests on the "zero" platform. >> >> - [X] Tier1 >> - [X] Tier2 >> - [X] Tier3 >> - [X] Tier4 >> - [X] Tier5 >> - [X] Tier6 (Except one test applications/jcstress/init.java as per below) >> >> >> Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner >> at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) >> at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) >> Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner >> at java.base/java.lang.Class.forName0(Native Method) >> at java.base/java.lang.Class.forName(Class.java:497) >> at java.base/java.lang.Class.forName(Class.java:476) >> at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) >> ... 1 more > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: > > - Merge branch 'master' into PR_21_V2 > - Update test/jdk/java/foreign/TestByteBuffer.java > > Co-authored-by: Andrey Turbanov > - Merge pull request #3 from JornVernee/IsForeignLinkerSupported > > rename has_port > - rename has_port > - Merge pull request #2 from JornVernee/WSL_BB > > account for missing functional in WSL in TestByteBuffer > - account for missing mincore on WSL in TestByteBuffer > - Merge branch 'master' into PR_21_V2 > - 8305369: Issues in zero-length memory segment javadoc section > - 8305087: MemoryLayout API checks should be more eager > - Merge master > - ... and 32 more: https://git.openjdk.org/jdk/compare/9fb53adf...ba04f5cc test/micro/org/openjdk/bench/jdk/incubator/vector/TestLoadStoreBytes.java line 92: > 90: long byteAlignment = SPECIES.vectorByteSize(); > 91: Arena scope = Arena.ofAuto(); > 92: dstSegment = scope.allocate(size, byteAlignment); Suggestion: srcSegment = Arena.ofAuto().allocate(size, SPECIES.vectorByteSize()); dstSegment = Arena.ofAuto().allocate(size, SPECIES.vectorByteSize()); test/micro/org/openjdk/bench/jdk/incubator/vector/TestLoadStoreShorts.java line 97: > 95: long byteAlignment = SPECIES.vectorByteSize(); > 96: Arena scope = Arena.ofAuto(); > 97: dstSegment = scope.allocate(size, byteAlignment); Suggestion: srcSegment = Arena.ofAuto().allocate(size, SPECIES.vectorByteSize()); dstSegment = Arena.ofAuto().allocate(size, SPECIES.vectorByteSize()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1177131218 PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1177131738 From fjiang at openjdk.org Wed Apr 26 01:56:52 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 26 Apr 2023 01:56:52 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 15:37:30 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Looks good, with some nits: src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 207: > 205: if (index_size == sizeof(u2)) { > 206: if (AvoidUnalignedAccesses) > 207: { Suggestion: if (AvoidUnalignedAccesses) { src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 208: > 206: if (AvoidUnalignedAccesses) > 207: { > 208: assert(index != tmp, "must use different register"); Suggestion: assert_different_registers(index, tmp); src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 210: > 208: assert(index != tmp, "must use different register"); > 209: load_unsigned_byte(index, Address(xbcp, bcp_offset)); > 210: load_unsigned_byte(tmp, Address(xbcp, bcp_offset+1)); Suggestion: load_unsigned_byte(tmp, Address(xbcp, bcp_offset + 1)); src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 218: > 216: } else if (index_size == sizeof(u4)) { > 217: if (AvoidUnalignedAccesses) > 218: { Suggestion: if (AvoidUnalignedAccesses) { src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 221: > 219: assert(index != tmp, "must use different register"); > 220: load_unsigned_byte(index, Address(xbcp, bcp_offset)); > 221: load_unsigned_byte(tmp, Address(xbcp, bcp_offset+1)); Suggestion: load_unsigned_byte(tmp, Address(xbcp, bcp_offset + 1)); src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 224: > 222: slli(tmp, tmp, 8); > 223: add(index, index, tmp); > 224: load_unsigned_byte(tmp, Address(xbcp, bcp_offset+2)); Suggestion: load_unsigned_byte(tmp, Address(xbcp, bcp_offset + 2)); src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 227: > 225: slli(tmp, tmp, 16); > 226: add(index, index, tmp); > 227: load_unsigned_byte(tmp, Address(xbcp, bcp_offset+3)); Suggestion: load_unsigned_byte(tmp, Address(xbcp, bcp_offset + 3)); src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 259: > 257: assert_different_registers(cache, index); > 258: assert_different_registers(cache, xcpool); > 259: //register "cache" is trashed in next shadd, so lets use it as a temporary register Suggestion: // register "cache" is trashed in next shadd, so lets use it as a temporary register src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 297: > 295: size_t index_size) { > 296: assert_different_registers(cache, tmp); > 297: //register "cache" is trashed in next ld, so lets use it as a temporary register Suggestion: // register "cache" is trashed in next ld, so lets use it as a temporary register src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1102: > 1100: __ mv(t1, unsatisfied); > 1101: if (AvoidUnalignedAccesses) > 1102: { Suggestion: if (AvoidUnalignedAccesses) { src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1115: > 1113: __ add(t1, t1, t0); > 1114: } else { > 1115: __ ld(t1, Address(t1, 0)); //2 bytes alligned, but not 4 or 8 Suggestion: __ ld(t1, Address(t1, 0)); // 2 bytes alligned, but not 4 or 8 Should we put this comment into `if` statement? src/hotspot/cpu/riscv/templateTable_riscv.cpp line 378: > 376: // non-null object (String, MethodType, etc.) > 377: assert_different_registers(result, tmp); > 378: //register result is trashed by next load, let's use it as temporary register Suggestion: // register result is trashed by next load, let's use it as temporary register src/hotspot/cpu/riscv/templateTable_riscv.cpp line 1626: > 1624: } else { > 1625: __ lhu(x12, at_bcp(1)); > 1626: } Suggestion: if (AvoidUnalignedAccesses) { __ lbu(t1, at_bcp(1)); __ lbu(x12, at_bcp(2)); __ slli(x12, x12, 8); __ add(x12, t1, x12); } else { __ lhu(x12, at_bcp(1)); } src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2027: > 2025: __ shadd(temp, h, array, temp, 3); > 2026: if (AvoidUnalignedAccesses) > 2027: { Suggestion: if (AvoidUnalignedAccesses) { src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2028: > 2026: if (AvoidUnalignedAccesses) > 2027: { > 2028: //array is BytesPerInt (aka 4) alligned Suggestion: // array is BytesPerInt (aka 4) alligned src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2034: > 2032: __ add(temp, temp, t1); > 2033: } else > 2034: { Suggestion: } else { src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2060: > 2058: __ shadd(temp, i, array, temp, 3); > 2059: if (AvoidUnalignedAccesses) > 2060: { Suggestion: if (AvoidUnalignedAccesses) { src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2061: > 2059: if (AvoidUnalignedAccesses) > 2060: { > 2061: //array is BytesPerInt (aka 4) alligned Suggestion: // array is BytesPerInt (aka 4) alligned ------------- Changes requested by fjiang (Author). PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1400997892 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177214715 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177213849 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177215730 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177216198 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177215848 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177215934 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177216005 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177224810 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177225034 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177220415 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177231442 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177226383 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177221375 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177221673 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177226617 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177221897 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177222132 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177222582 From fjiang at openjdk.org Wed Apr 26 02:17:57 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 26 Apr 2023 02:17:57 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 15:37:30 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 219: > 217: if (AvoidUnalignedAccesses) > 218: { > 219: assert(index != tmp, "must use different register"); Suggestion: assert_different_registers(index, tmp); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1177244493 From amitkumar at openjdk.org Wed Apr 26 03:33:27 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 03:33:27 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> Message-ID: On Tue, 25 Apr 2023 12:56:55 GMT, Lutz Schmidt wrote: >> What about using `stop` or anything which doesn't take a condition? That would avoid confusion. > >> What about using `stop` or anything which doesn't take a condition? That would avoid confusion. > > That is a good idea. It's of no benefit to check for the complement of the condition above. I'll update it with `stop`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1177294068 From amitkumar at openjdk.org Wed Apr 26 04:38:53 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 04:38:53 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v4] In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: address comment from Martin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12822/files - new: https://git.openjdk.org/jdk/pull/12822/files/cbfe5172..5635d301 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12822&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12822&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12822/head:pull/12822 PR: https://git.openjdk.org/jdk/pull/12822 From pminborg at openjdk.org Wed Apr 26 08:13:23 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 26 Apr 2023 08:13:23 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v25] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > ### Specdiff > https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html > > ### Javadoc > https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html > > ### Tests > > Testing excludes tests on the "zero" platform. > > - [X] Tier1 > - [X] Tier2 > - [X] Tier3 > - [X] Tier4 > - [X] Tier5 > - [X] Tier6 (Except one test applications/jcstress/init.java as per below) > > > Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) > at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) > Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:497) > at java.base/java.lang.Class.forName(Class.java:476) > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) > ... 1 more Per Minborg has updated the pull request incrementally with one additional commit since the last revision: 8306668: Some foreign tests fail on x86 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/ba04f5cc..fbd3520d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=23-24 Stats: 9 lines in 3 files changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From psandoz at openjdk.org Wed Apr 26 08:14:23 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 26 Apr 2023 08:14:23 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v24] In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 13:37:15 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> ### Specdiff >> https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html >> >> ### Javadoc >> https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html >> >> ### Tests >> >> Testing excludes tests on the "zero" platform. >> >> - [X] Tier1 >> - [X] Tier2 >> - [X] Tier3 >> - [X] Tier4 >> - [X] Tier5 >> - [X] Tier6 (Except one test applications/jcstress/init.java as per below) >> >> >> Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner >> at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) >> at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) >> Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner >> at java.base/java.lang.Class.forName0(Native Method) >> at java.base/java.lang.Class.forName(Class.java:497) >> at java.base/java.lang.Class.forName(Class.java:476) >> at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) >> ... 1 more > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: > > - Merge branch 'master' into PR_21_V2 > - Update test/jdk/java/foreign/TestByteBuffer.java > > Co-authored-by: Andrey Turbanov > - Merge pull request #3 from JornVernee/IsForeignLinkerSupported > > rename has_port > - rename has_port > - Merge pull request #2 from JornVernee/WSL_BB > > account for missing functional in WSL in TestByteBuffer > - account for missing mincore on WSL in TestByteBuffer > - Merge branch 'master' into PR_21_V2 > - 8305369: Issues in zero-length memory segment javadoc section > - 8305087: MemoryLayout API checks should be more eager > - Merge master > - ... and 32 more: https://git.openjdk.org/jdk/compare/9fb53adf...ba04f5cc Overall it looks very good. (Like for previous integration PRs feedback has been ongoing hence the lack of substantial comments here.) I recommend before pushing to run through IntelliJ's problem list for all the source, as it will easily catch things like unused stuff that can be removed (its easier to suggest that rather than me adding as comments) ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13079#pullrequestreview-1400862474 From pminborg at openjdk.org Wed Apr 26 08:36:54 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 26 Apr 2023 08:36:54 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v26] In-Reply-To: References: Message-ID: <-2b0YArLgBN3m5koQm5-irDhQkMPTm618hYadF7AjnQ=.09cfa772-05a6-49cf-9106-ea8698aa452b@github.com> > API changes for the FFM API (third preview) > > ### Specdiff > https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html > > ### Javadoc > https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html > > ### Tests > > Testing excludes tests on the "zero" platform. > > - [X] Tier1 > - [X] Tier2 > - [X] Tier3 > - [X] Tier4 > - [X] Tier5 > - [X] Tier6 (Except one test applications/jcstress/init.java as per below) > > > Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) > at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) > Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:497) > at java.base/java.lang.Class.forName(Class.java:476) > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) > ... 1 more Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/lang/foreign/Linker.java Co-authored-by: Paul Sandoz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/fbd3520d..b18d44c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=24-25 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From mdoerr at openjdk.org Wed Apr 26 08:42:00 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Apr 2023 08:42:00 GMT Subject: Integrated: 8306823: Native memory leak in SharedRuntime::notify_jvmti_unmount/mount. In-Reply-To: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> References: <5pnvV5z6BaaPhKb0UWkA5wEu3FfaKZfepYGXtMx_tDk=.da25228a-18bf-450d-a5dd-c5114d5f046c@github.com> Message-ID: On Tue, 25 Apr 2023 13:19:36 GMT, Martin Doerr wrote: > The code introduced by [JDK-8304303](https://bugs.openjdk.org/browse/JDK-8304303) uses `JNIHandles::make_local` which requires `JNIHandles::destroy_local` which is currently missing. This pull request has now been integrated. Changeset: d7476982 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/d74769826ddb5e68df76407fb94c7560475249a0 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8306823: Native memory leak in SharedRuntime::notify_jvmti_unmount/mount. Reviewed-by: pchilanomate, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/13641 From pminborg at openjdk.org Wed Apr 26 08:53:53 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 26 Apr 2023 08:53:53 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v27] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > ### Specdiff > https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html > > ### Javadoc > https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html > > ### Tests > > Testing excludes tests on the "zero" platform. > > - [X] Tier1 > - [X] Tier2 > - [X] Tier3 > - [X] Tier4 > - [X] Tier5 > - [X] Tier6 (Except one test applications/jcstress/init.java as per below) > > > Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) > at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) > Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:497) > at java.base/java.lang.Class.forName(Class.java:476) > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) > ... 1 more Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Update test/micro/org/openjdk/bench/jdk/incubator/vector/TestLoadStoreShorts.java Co-authored-by: Paul Sandoz - Update test/micro/org/openjdk/bench/jdk/incubator/vector/TestLoadStoreBytes.java Co-authored-by: Paul Sandoz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/b18d44c7..228d4c68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=25-26 Stats: 12 lines in 2 files changed: 0 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From qamai at openjdk.org Wed Apr 26 09:40:53 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 26 Apr 2023 09:40:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: Message-ID: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> On Thu, 20 Apr 2023 11:15:47 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary comments src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 81: > 79: // C2CodeStubList::emit() will throw an assertion and report the actual size that > 80: // is needed. > 81: return 33; This should be 36 with `ASSERT` and 21 without. If you are sure that `JavaThread::lock_stack_top_offset()` or `OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)` fits within an `int8_t` then it reduces 3 bytes for each usage. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 691: > 689: jccb(Assembler::notEqual, NO_COUNT); // If not recursive, ZF = 0 at this point (fail) > 690: incq(Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(recursions))); > 691: xorq(rax, rax); // Set ZF = 1 (success) for recursive lock, denoting locking success `xorl` would save a byte here, and some similar places. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 701: > 699: // ZFlag == 0 count in slow path > 700: jccb(Assembler::notZero, NO_COUNT); // jump if ZFlag == 0 > 701: `DONE_LABEL` is conditionally jumped into from a lot of places, the only path it is reached without known `ZF` seems to be `LM_LEGAGY` fall-through. Maybe refactor a little to eliminate this block. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 781: > 779: #ifdef _LP64 > 780: C2HandleAnonOMOwnerStub* stub = new (Compile::current()->comp_arena()) C2HandleAnonOMOwnerStub(tmpReg, boxReg); > 781: Compile::current()->output()->add_stub(stub); This should be added only if we are really emitting the code (i.e. not emitting into a scratch buffer to measure the node size) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 926: > 924: // Intentional fall-thru into DONE_LABEL > 925: } > 926: bind(DONE_LABEL); Similar to `fast_lock`, this `DONE_LABEL` can be removed. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9704: > 9702: > 9703: // If successful, push object to lock-stack. > 9704: movl(tmp, Address(thread, JavaThread::lock_stack_top_offset())); This value seems to be loaded twice, can they be merged? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177505070 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177289743 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177289153 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177301239 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177302528 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177293052 From qamai at openjdk.org Wed Apr 26 09:40:54 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 26 Apr 2023 09:40:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> References: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> Message-ID: <_Y6eLacV_ecmijvlzo2lGe-U5n6ZtaJnUA6KL9BsJJw=.a66f23b0-aaf6-43ac-a210-ad830a1e744c@github.com> On Wed, 26 Apr 2023 08:12:41 GMT, Quan Anh Mai wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary comments > > src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 81: > >> 79: // C2CodeStubList::emit() will throw an assertion and report the actual size that >> 80: // is needed. >> 81: return 33; > > This should be 36 with `ASSERT` and 21 without. If you are sure that `JavaThread::lock_stack_top_offset()` or `OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)` fits within an `int8_t` then it reduces 3 bytes for each usage. This stub has 2 instructions, and it seems not really uncommon, is it worth it to have a stub here? > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 781: > >> 779: #ifdef _LP64 >> 780: C2HandleAnonOMOwnerStub* stub = new (Compile::current()->comp_arena()) C2HandleAnonOMOwnerStub(tmpReg, boxReg); >> 781: Compile::current()->output()->add_stub(stub); > > This should be added only if we are really emitting the code (i.e. not emitting into a scratch buffer to measure the node size) Also, I think this `if (LockingMode == LM_LIGHTWEIGHT)` block should be moved out of the enclosing if block, we are checking for inflation here, it seems logical to separate the inflation path out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177570766 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177304198 From pminborg at openjdk.org Wed Apr 26 09:46:53 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 26 Apr 2023 09:46:53 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v28] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > ### Specdiff > https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html > > ### Javadoc > https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html > > ### Tests > > Testing excludes tests on the "zero" platform. > > - [X] Tier1 > - [X] Tier2 > - [X] Tier3 > - [X] Tier4 > - [X] Tier5 > - [X] Tier6 (Except one test applications/jcstress/init.java as per below) > > > Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) > at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) > Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:497) > at java.base/java.lang.Class.forName(Class.java:476) > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) > ... 1 more Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Use pattern maching in for-loop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/228d4c68..8393218c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=26-27 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From mdoerr at openjdk.org Wed Apr 26 09:50:53 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Apr 2023 09:50:53 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> Message-ID: On Wed, 26 Apr 2023 03:20:42 GMT, Amit Kumar wrote: >>> What about using `stop` or anything which doesn't take a condition? That would avoid confusion. >> >> That is a good idea. It's of no benefit to check for the complement of the condition above. > > I'll update it with `stop`. Thanks! That's much better readable. Note that there are more places like https://github.com/openjdk/jdk/blob/5635d301d89f62874969e475b53dcbd155aa56cb/src/hotspot/cpu/s390/stubRoutines_s390.cpp#L68. In addition, you don't need the illtrap after stop any more. Would be great if you could clean it up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1177616127 From tschatzl at openjdk.org Wed Apr 26 09:51:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Apr 2023 09:51:28 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Wed, 26 Apr 2023 09:20:46 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas The first two commits in the branch https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:remove-pinned-mask show this change (first two patches) in context of what is to come for [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)), which is the fourth patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13666#issuecomment-1523085024 From tschatzl at openjdk.org Wed Apr 26 09:51:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Apr 2023 09:51:28 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 Message-ID: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Hi all, please review this refactoring of collection set candidate set handling. The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). This patch only uses candidates from marking at this time. Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. In detail: * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). * there are several additional helper sets/lists * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. All these sets implement C++ iterators for simpler use in various places. Testing: - this patch only: tier1-3, gha - with JDK-8140326 tier1-7 (or 8?) Thanks, Thomas ------------- Commit messages: - Whitespace fixes - typo - More cleanup - Cleanup - Cleanup - Refactor collection set candidates Changes: https://git.openjdk.org/jdk/pull/13666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306541 Stats: 1137 lines in 26 files changed: 659 ins; 231 del; 247 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From fyang at openjdk.org Wed Apr 26 09:51:56 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Apr 2023 09:51:56 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 13:35:25 GMT, Per Minborg wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Add example for Option::captureStateLayout > > A review of all the copyright years shall be made in this PR. @minborg : Hello, I think you should list Feilong Jiang (OpenJDK ID: fjiang) as co-author instead. It is him who added code for the riscv port. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1523097913 From amitkumar at openjdk.org Wed Apr 26 10:14:30 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 10:14:30 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v5] In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: illtrap cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12822/files - new: https://git.openjdk.org/jdk/pull/12822/files/5635d301..a6386679 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12822&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12822&range=03-04 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12822/head:pull/12822 PR: https://git.openjdk.org/jdk/pull/12822 From amitkumar at openjdk.org Wed Apr 26 10:14:33 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 10:14:33 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v2] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <4YyyWF5efckg_fV1mDe-n5wA_NuL1f2_0dV-UlubXYg=.d0910723-35d3-45eb-9469-716da102928c@github.com> Message-ID: On Wed, 26 Apr 2023 09:36:36 GMT, Martin Doerr wrote: >> I'll update it with `stop`. > > Thanks! That's much better readable. Note that there are more places like https://github.com/openjdk/jdk/blob/5635d301d89f62874969e475b53dcbd155aa56cb/src/hotspot/cpu/s390/stubRoutines_s390.cpp#L68. In addition, you don't need the illtrap after stop any more. Would be great if you could clean it up. done.. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12822#discussion_r1177649630 From pminborg at openjdk.org Wed Apr 26 10:20:53 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 26 Apr 2023 10:20:53 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v29] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > ### Specdiff > https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html > > ### Javadoc > https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html > > ### Tests > > Testing excludes tests on the "zero" platform. > > - [X] Tier1 > - [X] Tier2 > - [X] Tier3 > - [X] Tier4 > - [X] Tier5 > - [X] Tier6 (Except one test applications/jcstress/init.java as per below) > > > Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) > at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) > Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:497) > at java.base/java.lang.Class.forName(Class.java:476) > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) > ... 1 more Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 48 commits: - Merge branch 'master' into PR_21_V2 - Use pattern maching in for-loop - Update test/micro/org/openjdk/bench/jdk/incubator/vector/TestLoadStoreShorts.java Co-authored-by: Paul Sandoz - Update test/micro/org/openjdk/bench/jdk/incubator/vector/TestLoadStoreBytes.java Co-authored-by: Paul Sandoz - Update src/java.base/share/classes/java/lang/foreign/Linker.java Co-authored-by: Paul Sandoz - 8306668: Some foreign tests fail on x86 - Merge branch 'master' into PR_21_V2 - Update test/jdk/java/foreign/TestByteBuffer.java Co-authored-by: Andrey Turbanov - Merge pull request #3 from JornVernee/IsForeignLinkerSupported rename has_port - rename has_port - ... and 38 more: https://git.openjdk.org/jdk/compare/d7476982...0365fcf2 ------------- Changes: https://git.openjdk.org/jdk/pull/13079/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=28 Stats: 13420 lines in 270 files changed: 5100 ins; 6182 del; 2138 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From mdoerr at openjdk.org Wed Apr 26 10:24:04 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Apr 2023 10:24:04 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v5] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: <8pxq-zEGVXj3Mc-WLLWipbsLG-GQAwDSqjyXh_oYvDU=.590325c4-f7c1-497d-8d58-c7cb4d56fcbd@github.com> On Wed, 26 Apr 2023 10:14:30 GMT, Amit Kumar wrote: >> This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > illtrap cleanup LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12822#pullrequestreview-1401664088 From vkempik at openjdk.org Wed Apr 26 10:29:25 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 26 Apr 2023 10:29:25 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v2] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: fix nits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/00cfe09a..adbf3cbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=00-01 Stats: 28 lines in 3 files changed: 0 ins; 6 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From lucy at openjdk.org Wed Apr 26 10:37:26 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Apr 2023 10:37:26 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v5] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Wed, 26 Apr 2023 10:14:30 GMT, Amit Kumar wrote: >> This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > illtrap cleanup LGTM. Finally, a long standing PR comes to a successful end! Thank you for all your effort and patience. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12822#pullrequestreview-1401690268 From amitkumar at openjdk.org Wed Apr 26 10:38:53 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 10:38:53 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v5] In-Reply-To: <8pxq-zEGVXj3Mc-WLLWipbsLG-GQAwDSqjyXh_oYvDU=.590325c4-f7c1-497d-8d58-c7cb4d56fcbd@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> <8pxq-zEGVXj3Mc-WLLWipbsLG-GQAwDSqjyXh_oYvDU=.590325c4-f7c1-497d-8d58-c7cb4d56fcbd@github.com> Message-ID: <_00zARgRd5s7HD-3QkhZiTGbD7HWWGKt1lfHF-p3c3Y=.5d0f06b6-03f4-4f9a-b726-7cc114fb3d1b@github.com> On Wed, 26 Apr 2023 10:18:18 GMT, Martin Doerr wrote: > LGTM. Thank You Martin.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12822#issuecomment-1523183399 From vkempik at openjdk.org Wed Apr 26 10:40:28 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 26 Apr 2023 10:40:28 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v3] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/adbf3cbd..8323870d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From vkempik at openjdk.org Wed Apr 26 10:40:29 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 26 Apr 2023 10:40:29 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 15:37:30 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Thanks for taking a look Feilong Jiang ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1523175966 From amitkumar at openjdk.org Wed Apr 26 11:11:54 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 11:11:54 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Fri, 17 Mar 2023 11:02:42 GMT, Lutz Schmidt wrote: >> So does that make sense to not define them inlined at all ? Because if I do not include `.inline.hpp` then it's doesn't build. > >> So does that make sense to not define them inlined at all ? Because if I do not include `.inline.hpp` then it's doesn't build. > > I was going to ask you anyway why you decided to convert the asm_assert stuff to inline methods. In general, you do that for performance reasons. Sometimes, it even helps with code size. Think of simple getter and setter methods, where the call overhead is much larger than the "useful" code. > > The following rules can guide you with "optimization" decisions: > > - Optimize for the release build case (execution time and footprint). > - Optimize for the hottest code path. > - Optimize for the "OK" case. > - The path taken if an error occurred does not need to be as efficient as possible. > - If an assertion fails, it's the end of the JVM. There is no need to rush. > > With the inclusion of macroAssembler.inline.hpp in interp_masm.hpp you are out of luck. As Martin stated, this is not acceptable. *.inline.hpp files are only to be included in *.cpp files. There are 63 *.cpp files including interp_masm.hpp. Any of these may require the definitions from macroAssembler.inline.hpp. How to find out which ones? Trial and error. Thank you @RealLucy, I rebased locally and build is fine with new commits as well. So I guess we are good to go.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12822#issuecomment-1523218154 From shade at openjdk.org Wed Apr 26 11:11:54 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Apr 2023 11:11:54 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v4] In-Reply-To: <0t8Fp7BDksOpJK4c2fU93aMSvGS8SthjDFk1LM4BFi0=.4614b9e8-407f-4071-ba68-29c18f2b929a@github.com> References: <0t8Fp7BDksOpJK4c2fU93aMSvGS8SthjDFk1LM4BFi0=.4614b9e8-407f-4071-ba68-29c18f2b929a@github.com> Message-ID: On Tue, 25 Apr 2023 09:38:14 GMT, Aleksei Voitylov wrote: >> Provides missing implementation for arm32. >> >> Testing: hotspot/jtreg. > > Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: > > address Matias comments Still good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13596#pullrequestreview-1401727075 From lucy at openjdk.org Wed Apr 26 11:22:04 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Apr 2023 11:22:04 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Fri, 17 Mar 2023 11:02:42 GMT, Lutz Schmidt wrote: >> So does that make sense to not define them inlined at all ? Because if I do not include `.inline.hpp` then it's doesn't build. > >> So does that make sense to not define them inlined at all ? Because if I do not include `.inline.hpp` then it's doesn't build. > > I was going to ask you anyway why you decided to convert the asm_assert stuff to inline methods. In general, you do that for performance reasons. Sometimes, it even helps with code size. Think of simple getter and setter methods, where the call overhead is much larger than the "useful" code. > > The following rules can guide you with "optimization" decisions: > > - Optimize for the release build case (execution time and footprint). > - Optimize for the hottest code path. > - Optimize for the "OK" case. > - The path taken if an error occurred does not need to be as efficient as possible. > - If an assertion fails, it's the end of the JVM. There is no need to rush. > > With the inclusion of macroAssembler.inline.hpp in interp_masm.hpp you are out of luck. As Martin stated, this is not acceptable. *.inline.hpp files are only to be included in *.cpp files. There are 63 *.cpp files including interp_masm.hpp. Any of these may require the definitions from macroAssembler.inline.hpp. How to find out which ones? Trial and error. > @RealLucy Only the author (@offamitkumar) is allowed to issue the `integrate` command. Stupidity rules! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12822#issuecomment-1523241841 From rkennke at openjdk.org Wed Apr 26 11:26:56 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Apr 2023 11:26:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: <_Y6eLacV_ecmijvlzo2lGe-U5n6ZtaJnUA6KL9BsJJw=.a66f23b0-aaf6-43ac-a210-ad830a1e744c@github.com> References: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> <_Y6eLacV_ecmijvlzo2lGe-U5n6ZtaJnUA6KL9BsJJw=.a66f23b0-aaf6-43ac-a210-ad830a1e744c@github.com> Message-ID: On Wed, 26 Apr 2023 09:03:07 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 81: >> >>> 79: // C2CodeStubList::emit() will throw an assertion and report the actual size that >>> 80: // is needed. >>> 81: return 33; >> >> This should be 36 with `ASSERT` and 21 without. If you are sure that `JavaThread::lock_stack_top_offset()` or `OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)` fits within an `int8_t` then it reduces 3 bytes for each usage. > > This stub has 2 instructions, and it seems not really uncommon, is it worth it to have a stub here? Ok I will change the value. Yes, this path is relatively uncommon (monitors are inflated only once, and not necessarily via ANONYMOUS handshake, but used often), and this path is performance relevant. The original impl had the two instructions inlined, but the common forward branch impacted performance. >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 781: >> >>> 779: #ifdef _LP64 >>> 780: C2HandleAnonOMOwnerStub* stub = new (Compile::current()->comp_arena()) C2HandleAnonOMOwnerStub(tmpReg, boxReg); >>> 781: Compile::current()->output()->add_stub(stub); >> >> This should be added only if we are really emitting the code (i.e. not emitting into a scratch buffer to measure the node size) > > Also, I think this `if (LockingMode == LM_LIGHTWEIGHT)` block should be moved out of the enclosing if block, we are checking for inflation here, it seems logical to separate the inflation path out. How would I check if we are emitting code? I am not sure I understand. The check for ANONYMOUS is only relevant when we observe an already-inflated monitor. I think this is the right place to put it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177703431 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177699910 From qamai at openjdk.org Wed Apr 26 11:26:57 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 26 Apr 2023 11:26:57 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> <_Y6eLacV_ecmijvlzo2lGe-U5n6ZtaJnUA6KL9BsJJw=.a66f23b0-aaf6-43ac-a210-ad830a1e744c@github.com> Message-ID: On Wed, 26 Apr 2023 10:52:29 GMT, Roman Kennke wrote: >> This stub has 2 instructions, and it seems not really uncommon, is it worth it to have a stub here? > > Ok I will change the value. > Yes, this path is relatively uncommon (monitors are inflated only once, and not necessarily via ANONYMOUS handshake, but used often), and this path is performance relevant. The original impl had the two instructions inlined, but the common forward branch impacted performance. I see, thanks a lot for your explanations >> Also, I think this `if (LockingMode == LM_LIGHTWEIGHT)` block should be moved out of the enclosing if block, we are checking for inflation here, it seems logical to separate the inflation path out. > > How would I check if we are emitting code? > > I am not sure I understand. The check for ANONYMOUS is only relevant when we observe an already-inflated monitor. I think this is the right place to put it. The entry barrier does this: https://github.com/openjdk/jdk/blob/86f41a4c42268d364175263804eb4d1ce82fa943/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L139 `testptr(tmpReg, markWord::monitor_value)` is checking for inflation, and the following `if` block acts when inflation is detected, what I mean is to move the whole enclosed if down out of the `if (LockingMode != LM_MONITOR)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177712615 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177711237 From rkennke at openjdk.org Wed Apr 26 11:27:02 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Apr 2023 11:27:02 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> References: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> Message-ID: On Wed, 26 Apr 2023 03:09:53 GMT, Quan Anh Mai wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary comments > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 691: > >> 689: jccb(Assembler::notEqual, NO_COUNT); // If not recursive, ZF = 0 at this point (fail) >> 690: incq(Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(recursions))); >> 691: xorq(rax, rax); // Set ZF = 1 (success) for recursive lock, denoting locking success > > `xorl` would save a byte here, and some similar places. Yes, but see above. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 701: > >> 699: // ZFlag == 0 count in slow path >> 700: jccb(Assembler::notZero, NO_COUNT); // jump if ZFlag == 0 >> 701: > > `DONE_LABEL` is conditionally jumped into from a lot of places, the only path it is reached without known `ZF` seems to be `LM_LEGAGY` fall-through. Maybe refactor a little to eliminate this block. I intentionally have not changed the existing paths to make it absolutely clear that the old behaviour is not changed. I'd rather make any changes to the stack-locking in a separate follow-up. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 926: > >> 924: // Intentional fall-thru into DONE_LABEL >> 925: } >> 926: bind(DONE_LABEL); > > Similar to `fast_lock`, this `DONE_LABEL` can be removed. Yes but see above ;-) > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9704: > >> 9702: >> 9703: // If successful, push object to lock-stack. >> 9704: movl(tmp, Address(thread, JavaThread::lock_stack_top_offset())); > > This value seems to be loaded twice, can they be merged? That would be nice, but we cannot do this without allocating another tmp register. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177695188 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177695040 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177700194 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177696506 From amitkumar at openjdk.org Wed Apr 26 11:36:57 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 11:36:57 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v6] In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into simplify_assert - illtrap cleanup - address comment from Martin - Merge branch 'master' into simplify_assert - added id to illtrap - address lutz comment & revert inlining - Revert "inline asm_assert" - Revert "inlined assert method" - Revert "added inline keyword before function implementation" - added inline keyword before function implementation - ... and 7 more: https://git.openjdk.org/jdk/compare/35e7bc21...d830eead ------------- Changes: https://git.openjdk.org/jdk/pull/12822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12822&range=05 Stats: 85 lines in 9 files changed: 3 ins; 39 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/12822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12822/head:pull/12822 PR: https://git.openjdk.org/jdk/pull/12822 From amitkumar at openjdk.org Wed Apr 26 11:37:25 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 11:37:25 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v5] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: <57QiO3V9cITIZo-7CS89dBsZpJ55UhgFNMlUEEYWYVY=.f1dbada9-7de4-4ed1-b2ca-d586dc0dd8bd@github.com> On Wed, 26 Apr 2023 10:14:30 GMT, Amit Kumar wrote: >> This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > illtrap cleanup PR https://github.com/openjdk/jdk/pull/13650, brought merge conflict... :-( ------------- PR Comment: https://git.openjdk.org/jdk/pull/12822#issuecomment-1523248512 From fyang at openjdk.org Wed Apr 26 11:45:28 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Apr 2023 11:45:28 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v2] In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 13:09:37 GMT, Fredrik Bredberg wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated after review FYI: I also performed tier1-3 tests on linux-riscv64 unmatched boards, result looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13477#issuecomment-1523270676 From amitkumar at openjdk.org Wed Apr 26 11:56:58 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 11:56:58 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v6] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: <9e836tK6B7NbLcEA3fOFwMMEti_BDxXpVQTrjWKzFqs=.89eaaed6-c43d-4728-ab48-9482a32d7902@github.com> On Wed, 26 Apr 2023 11:36:57 GMT, Amit Kumar wrote: >> This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into simplify_assert > - illtrap cleanup > - address comment from Martin > - Merge branch 'master' into simplify_assert > - added id to illtrap > - address lutz comment & revert inlining > - Revert "inline asm_assert" > - Revert "inlined assert method" > - Revert "added inline keyword before function implementation" > - added inline keyword before function implementation > - ... and 7 more: https://git.openjdk.org/jdk/compare/35e7bc21...d830eead locally build is fine.. with current changes as well.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12822#issuecomment-1523273494 From lucy at openjdk.org Wed Apr 26 12:08:53 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Apr 2023 12:08:53 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v6] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Wed, 26 Apr 2023 11:36:57 GMT, Amit Kumar wrote: >> This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into simplify_assert > - illtrap cleanup > - address comment from Martin > - Merge branch 'master' into simplify_assert > - added id to illtrap > - address lutz comment & revert inlining > - Revert "inline asm_assert" > - Revert "inlined assert method" > - Revert "added inline keyword before function implementation" > - added inline keyword before function implementation > - ... and 7 more: https://git.openjdk.org/jdk/compare/35e7bc21...d830eead > PR #13650, brought merge conflict... :-( Wrong integration order, probably. Should have sponsored this PR first, but... (see above). I'll integrate once the tests have completed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12822#issuecomment-1523289450 From duke at openjdk.org Wed Apr 26 13:15:53 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Wed, 26 Apr 2023 13:15:53 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: <21q146HoJ5QRulUmci1HV1j94Y1d2RQxSUcEt-4mzOY=.fd41bc65-0183-49fc-a784-0081a885ba45@github.com> On Tue, 25 Apr 2023 21:48:28 GMT, Vladimir Ivanov wrote: >> This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. > > There are other compiler-related flags (e.g., `JVM_ACC_NOT_C1_COMPILABLE`/`JVM_ACC_NOT_C2_COMPILABLE`/`JVM_ACC_NOT_C2_OSR_COMPILABLE` ) which should not be persisted into CDS archive. > > Also, I assume the following flags should be always unset in the archived version: > `JVM_ACC_IS_PREFIXED_NATIVE`, `JVM_ACC_HAS_RESOLVED_METHOD`, `JVM_ACC_IS_OLD`, `JVM_ACC_IS_OBSOLETE`, `JVM_ACC_ON_STACK`, `JVM_ACC_IS_DELETED`, `JVM_ACC_IS_BEING_REDEFINED`. @iwanowww I don't see `JVM_ACC_HAS_RESOLVED_METHOD` and `JVM_ACC_IS_BEING_REDEFINED` used/referred anywhere in the code base. Where do you see them? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1523386687 From rkennke at openjdk.org Wed Apr 26 13:16:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Apr 2023 13:16:26 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v63] In-Reply-To: References: Message-ID: <89qIyO7ZpW-n-BqgskUr0vD04rjzu7ugBJZj7t5sonA=.2f7ed1ec-a2db-4042-97fd-68db0b02b292@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Suggested changes by @merykitty ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/5d0a0451..5f927f9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=62 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=61-62 Stats: 13 lines in 2 files changed: 6 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From duke at openjdk.org Wed Apr 26 14:46:00 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Wed, 26 Apr 2023 14:46:00 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v3] In-Reply-To: References: Message-ID: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Put back assert in recurse_thaw_interpreted_frame - Merge branch 'master' into freeze_thaw_interpreter_JDK-8300197_2023-01-19 - Updated after review - 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13477/files - new: https://git.openjdk.org/jdk/pull/13477/files/3b4c0fa3..8de6a1f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13477&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13477&range=01-02 Stats: 252588 lines in 1747 files changed: 233939 ins; 8300 del; 10349 mod Patch: https://git.openjdk.org/jdk/pull/13477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13477/head:pull/13477 PR: https://git.openjdk.org/jdk/pull/13477 From duke at openjdk.org Wed Apr 26 14:46:23 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Wed, 26 Apr 2023 14:46:23 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v3] In-Reply-To: References: <4e8KcPoKR30cttDfzPPwWCH1BNLje1EyVJKSjyMRHXY=.e60bcac8-4bd8-431a-8e5d-e5519718f0d0@github.com> Message-ID: On Fri, 21 Apr 2023 16:03:50 GMT, Patricio Chilano Mateo wrote: >> Your're right @pchilano, it should. But since the line just above is: >> `const int fsize = heap_frame_bottom - heap_frame_top;` >> I thought that there was no use in keeping the `assert` at all, so I removed it. > > Yes, but it's comparing it with the created stack frame's size so I think it doesn't hurt to keep it. We have an equivalent assert in recurse_freeze_interpreted_frame(). Put back the assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13477#discussion_r1177967812 From pchilanomate at openjdk.org Wed Apr 26 14:57:33 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 26 Apr 2023 14:57:33 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v3] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 14:46:00 GMT, Fredrik Bredberg wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Put back assert in recurse_thaw_interpreted_frame > - Merge branch 'master' into freeze_thaw_interpreter_JDK-8300197_2023-01-19 > - Updated after review > - 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call Thanks @fbredber! ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13477#pullrequestreview-1402187795 From tschatzl at openjdk.org Wed Apr 26 14:57:54 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Apr 2023 14:57:54 GMT Subject: RFR: 8305566: ZGC: gc/stringdedup/TestStringDeduplicationFullGC.java#Z failed with SIGSEGV in ZBarrier::weak_load_barrier_on_phantom_oop_slow_path [v2] In-Reply-To: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> References: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> Message-ID: On Mon, 24 Apr 2023 09:25:01 GMT, Kim Barrett wrote: >> Please review this change to the string deduplication thread to make it a kind >> of JavaThread rather than a ConcurrentGCThread. There are several pieces to >> this change: >> >> (1) New class StringDedupThread (derived from JavaThread), separate from >> StringDedup::Processor (which is now just a CHeapObj instead of deriving from >> ConcurrentGCThread). The thread no longer needs to or supports being stopped, >> like other similar threads. It also needs to be started later, once Java >> threads are supported. Also don't need an explicit visitor, since it will be >> in the normal Java threads list. This separation made the changeover a little >> cleaner to develop, and made the servicability support a little cleaner too. >> >> (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, >> instead of using the SuspendibleThreadSet facility. >> >> (3) Because we're using ThreadBlockInVM, which has a different usage style >> from STS, the tracking of time spent by the processor blocked for safepoints >> doesn't really work. It's not very important anyway, since normal thread >> descheduling can also affect the normal processing times being gathered and >> reported. So we just drop the so-called "blocked" time and associated >> infrastructure, simplifying Stat tracking a bit. Also renamed the >> "concurrent" stat to be "active", since it's all in a JavaThread now. >> >> (4) To avoid #include problems, moved the definition of >> JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, >> where one of the functions it calls also is defined. >> >> (5) Added servicability support for the new thread. >> >> Testing: >> mach5 tier1-3 with -XX:+UseStringDeduplication. >> The test runtime/cds/DeterministicDump.java fails intermittently with that >> option, which is not surprising - see JDK-8306712. >> >> I was never able to reproduce the failure; it's likely quite timing sensitive. >> The fix of changing the type is based on StefanK's comment that ZResurrection >> doesn't expect a non-Java thread to perform load-barriers. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > fix include order Please improve the bug name as suggested by @shipilev before pushing. Looks good otherwise. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13607#pullrequestreview-1402152540 From eosterlund at openjdk.org Wed Apr 26 15:17:53 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 26 Apr 2023 15:17:53 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v5] In-Reply-To: References: Message-ID: <-DqwpbRbPCNczggTPjvpqiHN0mdy6UG75FCn2Xidif0=.c7a5a5cb-3915-48a5-8d78-6065b46531ca@github.com> On Mon, 24 Apr 2023 16:50:59 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - Replace NULL with nullptr in new code > - Merge branch 'master' into tkr-zgc > - Review fixes > - ... and 1 more: https://git.openjdk.org/jdk/compare/62acc882...c7bb4391 Looks good. Great work, @tkrodriguez. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/11996#pullrequestreview-1402224471 From coleenp at openjdk.org Wed Apr 26 16:06:26 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Apr 2023 16:06:26 GMT Subject: RFR: 8306851: Move Method access flags Message-ID: This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. Tested with tier1-6, and some manual verification of printing. ------------- Commit messages: - fix comment - 8306851: Move Method access flags Changes: https://git.openjdk.org/jdk/pull/13654/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306851 Stats: 782 lines in 25 files changed: 321 ins; 303 del; 158 mod Patch: https://git.openjdk.org/jdk/pull/13654.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13654/head:pull/13654 PR: https://git.openjdk.org/jdk/pull/13654 From rrich at openjdk.org Wed Apr 26 16:12:53 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 26 Apr 2023 16:12:53 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v3] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 14:46:00 GMT, Fredrik Bredberg wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Put back assert in recurse_thaw_interpreted_frame > - Merge branch 'master' into freeze_thaw_interpreter_JDK-8300197_2023-01-19 > - Updated after review > - 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call Looks good. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13477#pullrequestreview-1402328717 From rrich at openjdk.org Wed Apr 26 16:15:56 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 26 Apr 2023 16:15:56 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT Message-ID: With this pr I would like to wrap the body of the macro `offset_of` as defined for gcc in parentheses to fix the described issues with Eclipse CDT. As an alternative the [`(int)` cast in `byte_offset_of`](https://github.com/openjdk/jdk/blob/44d9f55d0b3c469988be6f1c47f0cfbc433c4490/src/hotspot/share/utilities/sizes.hpp#L59) could be removed. I preferred adding the parentheses to the gcc version of `offset_of` because the impact is smaller. Of course I'd rather have a local solution for the issue but I couldn't find one that didn't require changes in the source code. Testing: GHA ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/13668/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13668&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306901 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13668.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13668/head:pull/13668 PR: https://git.openjdk.org/jdk/pull/13668 From amitkumar at openjdk.org Wed Apr 26 16:16:01 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 16:16:01 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition [v5] In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Wed, 26 Apr 2023 10:32:49 GMT, Lutz Schmidt wrote: > Finally, a long standing PR comes to a successful end! @RealLucy I think it?s time to reconsider this statement ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12822#issuecomment-1523678359 From amitkumar at openjdk.org Wed Apr 26 16:16:02 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Apr 2023 16:16:02 GMT Subject: Integrated: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Thu, 2 Mar 2023 07:49:05 GMT, Amit Kumar wrote: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. This pull request has now been integrated. Changeset: a18191fe Author: Amit Kumar Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/a18191fee8347c82764d3b2e2841d24d4670d47d Stats: 85 lines in 9 files changed: 3 ins; 39 del; 43 mod 8302328: [s390x] Simplify asm_assert definition Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/12822 From dcubed at openjdk.org Wed Apr 26 16:34:53 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 26 Apr 2023 16:34:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: Message-ID: On Thu, 20 Apr 2023 11:15:47 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary comments src/hotspot/cpu/aarch64/aarch64.ad line 3875: > 3873: __ b(cont); > 3874: } else { > 3875: assert(LockingMode == LM_LIGHTWEIGHT, ""); Perhaps should be: s/""/"must be"/ I'm not fond of empty assert mesgs. src/hotspot/cpu/aarch64/aarch64.ad line 3956: > 3954: __ b(cont); > 3955: } else { > 3956: assert(LockingMode == LM_LIGHTWEIGHT, ""); Perhaps should be: s/""/"must be"/ I'm not fond of empty assert mesgs. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1813: > 1811: __ br(Assembler::NE, slow_path_lock); > 1812: } else { > 1813: assert(LockingMode == LM_LIGHTWEIGHT, ""); Perhaps should be: s/""/"must be"/ I'm not fond of empty assert mesgs. src/hotspot/cpu/arm/c2_MacroAssembler_arm.cpp line 100: > 98: fast_lock_2(Roop /* obj */, Rbox /* t1 */, Rscratch /* t2 */, Rscratch2 /* t3 */, > 99: 1 /* savemask (save t1) */, > 100: done); Why not line up the '1' below the 'R' in Roop and join with the 'done);' line? src/hotspot/cpu/arm/c2_MacroAssembler_arm.cpp line 152: > 150: fast_unlock_2(Roop /* obj */, Rbox /* t1 */, Rscratch /* t2 */, Rscratch2 /* t3 */, > 151: 1 /* savemask (save t1) */, > 152: done); Why not line up the '1' below the 'R' in Roop and join with the 'done);' line? src/hotspot/cpu/arm/interp_masm_arm.cpp line 900: > 898: b(done); > 899: > 900: } else if (LockingMode == LM_LEGACY) { Why so many blank lines in this new block? src/hotspot/cpu/arm/interp_masm_arm.cpp line 1025: > 1023: fast_unlock_2(Robj /* obj */, Rlock /* t1 */, Rmark /* t2 */, Rtemp /* t3 */, > 1024: 1 /* savemask (save t1) */, > 1025: slow_case); Why not line up the '1' below the 'R' in Roop and join with the 'done);' line? src/hotspot/cpu/arm/macroAssembler_arm.cpp line 1803: > 1801: #ifdef ASSERT > 1802: // Poison scratch regs > 1803: POISON_REGS((~savemask), t1, t2, t3, 0x10000001); Should this poison value be: 0x20000002 src/hotspot/cpu/arm/macroAssembler_arm.cpp line 1811: > 1809: // Attempt to fast-unlock an object > 1810: // Registers: > 1811: // - obj: the object to be locked nit typo: s/locked/unlocked/ src/hotspot/cpu/arm/macroAssembler_arm.hpp line 1023: > 1021: // Attempt to fast-unlock an object > 1022: // Registers: > 1023: // - obj: the object to be locked nit typo: s/locked/unlocked/ src/hotspot/cpu/arm/sharedRuntime_arm.cpp line 649: > 647: > 648: __ flush(); > 649: return AdapterHandlerLibrary::new_entry(fingerprint, i2c_entry, c2i_entry, c2i_unverified_entry); This change seems out of place... what's the story here? src/hotspot/cpu/arm/sharedRuntime_arm.cpp line 1246: > 1244: if (LockingMode == LM_LIGHTWEIGHT) { > 1245: log_trace(fastlock2)("SharedRuntime unlock fast"); > 1246: __ fast_unlock_2(sync_obj, R2, tmp, Rtemp, 7, slow_unlock); No comments on the params like in other places... src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9694: > 9692: > 9693: // Now we attempt to take the fast-lock. > 9694: // Clear lowest two header bits (locked state). Perhaps: // Clear lock_mask bits (locked state). so that you don't tie this comment to the implementation size of the lock_mask. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9697: > 9695: andptr(hdr, ~(int32_t)markWord::lock_mask_in_place); > 9696: movptr(tmp, hdr); > 9697: // Set lowest bit (unlocked state). Perhaps: // Set unlocked_value bit. so that you don't tied this comment to the implementation of the unlocked_value being the lowest bit. I'm less worried about 'bit' versus 'bits' for this one. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9721: > 9719: assert_different_registers(obj, hdr, tmp); > 9720: > 9721: // Mark-word must be 00 now, try to swing it back to 01 (unlocked) Perhaps: // Mark-word must be lock_mask now, try to swing it back to unlocked_value. so that you don't tie this comment to the implementation values of lock_mask and unlocked_value. src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1717: > 1715: __ jcc(Assembler::notEqual, slow_path_lock); > 1716: } else { > 1717: assert(LockingMode == LM_LIGHTWEIGHT, ""); Perhaps should be: s/""/"must be"/ I'm not fond of empty assert mesgs. src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1876: > 1874: __ dec_held_monitor_count(); > 1875: } else { > 1876: assert(LockingMode == LM_LIGHTWEIGHT, ""); Perhaps should be: s/""/"must be"/ I'm not fond of empty assert mesgs. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2187: > 2185: __ jcc(Assembler::notEqual, slow_path_lock); > 2186: } else { > 2187: assert(LockingMode == LM_LIGHTWEIGHT, ""); Perhaps should be: s/""/"must be"/ I'm not fond of empty assert mesgs. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2331: > 2329: __ dec_held_monitor_count(); > 2330: } else { > 2331: assert(LockingMode == LM_LIGHTWEIGHT, ""); Perhaps should be: s/""/"must be"/ I'm not fond of empty assert mesgs. src/hotspot/share/interpreter/interpreterRuntime.cpp line 759: > 757: // also keep the BasicObjectLock, but we don't really need it anyway, we only need > 758: // the object. See also InterpreterMacroAssembler::lock_object(). > 759: // As soon as traditional stack-locking goes away we could remove the other monitorenter() entry Perhaps: s/traditional/legacy/ for terminology consistency... src/hotspot/share/logging/logTag.hpp line 80: > 78: LOG_TAG(exceptions) \ > 79: LOG_TAG(exit) \ > 80: LOG_TAG(fastlock2) \ So why 'fastlock2'? Where's 'fastlock1'? Or 'fastlock'? src/hotspot/share/oops/markWord.hpp line 175: > 173: } > 174: bool has_locker() const { > 175: assert(LockingMode == LM_LEGACY, "should only be called with traditional stack locking"); Perhaps: s/traditional/legacy/ for terminology consistency... src/hotspot/share/runtime/arguments.cpp line 1994: > 1992: if (UseHeavyMonitors) { > 1993: FLAG_SET_CMDLINE(LockingMode, LM_MONITOR); > 1994: } HotSpot option processing has a general rule of last setting wins. With L1992-1994 here, I think there might be a problem with a cmd line that specifies: -XX:+UseHeavyMonitors -XX:LockingMode=1 I think that the resulting value of `LockingMode` will be `LM_MONITOR` instead of `LM_LEGACY`. Granted mixing uses of `UseHeavyMonitors` with `LockingMode` is just asking for trouble... src/hotspot/share/runtime/globals.hpp line 1981: > 1979: "Select locking mode: " \ > 1980: "0: monitors only, " \ > 1981: "1: monitors & traditional stack-locking (default), " \ Perhaps: s/traditional/legacy/ to be consistent with terminolgy... src/hotspot/share/runtime/javaThread.hpp line 1162: > 1160: > 1161: > 1162: static OopStorage* thread_oop_storage(); nit: delete extra blank line on L1161 src/hotspot/share/runtime/lockStack.cpp line 41: > 39: LockStack::LockStack(JavaThread* jt) : > 40: _top(lock_stack_base_offset), _base() > 41: { nit: '{' on L41 should be at the end of L40 (after a space). src/hotspot/share/runtime/lockStack.cpp line 63: > 61: #ifndef PRODUCT > 62: void LockStack::verify(const char* msg) const { > 63: assert(LockingMode == LM_LIGHTWEIGHT, "never use lock-stack when fast-locking is disabled"); Perhaps: s/fast-locking/light weight locking/ src/hotspot/share/runtime/lockStack.cpp line 64: > 62: void LockStack::verify(const char* msg) const { > 63: assert(LockingMode == LM_LIGHTWEIGHT, "never use lock-stack when fast-locking is disabled"); > 64: assert((_top <= end_offset()), "lockstack overflow: _top %d end_offset %d", _top, end_offset()); nit: extra space after `<=` src/hotspot/share/runtime/lockStack.cpp line 65: > 63: assert(LockingMode == LM_LIGHTWEIGHT, "never use lock-stack when fast-locking is disabled"); > 64: assert((_top <= end_offset()), "lockstack overflow: _top %d end_offset %d", _top, end_offset()); > 65: assert((_top >= start_offset()), "lockstack underflow: _topt %d end_offset %d", _top, start_offset()); nit typo: s/_topt/_top/ src/hotspot/share/runtime/lockStack.inline.hpp line 74: > 72: _base[to_index(_top)] = nullptr; > 73: #endif > 74: assert(!contains(o), "entries must be unique"); Perhaps: assert(!contains(o), "entries must be unique: " PTR_FORMAT, p2i(o)); src/hotspot/share/runtime/lockStack.inline.hpp line 81: > 79: inline void LockStack::remove(oop o) { > 80: verify("pre-remove"); > 81: assert(contains(o), "entry must be present"); Perhaps: assert(contains(o), "entry must be present: " PTR_FORMAT, p2i(o)); src/hotspot/share/runtime/synchronizer.cpp line 506: > 504: if (!useHeavyMonitors()) { > 505: if (LockingMode == LM_LIGHTWEIGHT) { > 506: // Fast-locking does not use the 'lock' argument.. nit: extra period at the end. src/hotspot/share/runtime/synchronizer.cpp line 1049: > 1047: > 1048: if (mark.has_monitor()) { > 1049: // inflated monitor so header points to ObjectMonitor (tagged pointer). nit: s/inflated/Inflated/ src/hotspot/share/runtime/synchronizer.cpp line 1077: > 1075: > 1076: if (mark.has_monitor()) { > 1077: // inflated monitor so header points to ObjectMonitor (tagged pointer). nit: s/inflated/Inflated/ src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 213: > 211: // refer to Threads::owning_thread_from_monitor_owner > 212: public JavaThread owningThreadFromMonitor(Address o) { > 213: assert(VM.getVM().getCommandLineFlag("LockingMode").getInt() != 2); Please put a comment after that literal '2': assert(VM.getVM().getCommandLineFlag("LockingMode").getInt() != 2 /* LM_LIGHTWEIGHT */); src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 231: > 229: > 230: public JavaThread owningThreadFromMonitor(ObjectMonitor monitor) { > 231: if (VM.getVM().getCommandLineFlag("LockingMode").getInt() == 2) { Please put a comment after that literal '2': if (VM.getVM().getCommandLineFlag("LockingMode").getInt() == 2 /* LM_LIGHTWEIGHT */) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175779923 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175782883 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175838145 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175841805 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175841903 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175842779 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175843663 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175846203 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175845306 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175846631 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175847483 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1175847916 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176939876 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176943266 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176946313 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176950471 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176951363 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176952831 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176953184 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176960883 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176963809 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1176965777 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177061329 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177064110 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177066319 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177067974 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177069462 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177071139 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177071982 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177085020 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1177084574 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178061523 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178072667 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178076085 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178100632 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178101434 From dcubed at openjdk.org Wed Apr 26 16:34:56 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 26 Apr 2023 16:34:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> Message-ID: On Wed, 26 Apr 2023 10:43:48 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 701: >> >>> 699: // ZFlag == 0 count in slow path >>> 700: jccb(Assembler::notZero, NO_COUNT); // jump if ZFlag == 0 >>> 701: >> >> `DONE_LABEL` is conditionally jumped into from a lot of places, the only path it is reached without known `ZF` seems to be `LM_LEGAGY` fall-through. Maybe refactor a little to eliminate this block. > > I intentionally have not changed the existing paths to make it absolutely clear that the old behaviour is not changed. I'd rather make any changes to the stack-locking in a separate follow-up. Thanks for minimizing changes to the old/legacy code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178113575 From dcubed at openjdk.org Wed Apr 26 16:35:23 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 26 Apr 2023 16:35:23 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v63] In-Reply-To: <89qIyO7ZpW-n-BqgskUr0vD04rjzu7ugBJZj7t5sonA=.2f7ed1ec-a2db-4042-97fd-68db0b02b292@github.com> References: <89qIyO7ZpW-n-BqgskUr0vD04rjzu7ugBJZj7t5sonA=.2f7ed1ec-a2db-4042-97fd-68db0b02b292@github.com> Message-ID: On Wed, 26 Apr 2023 13:16:26 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Suggested changes by @merykitty src/hotspot/share/runtime/synchronizer.cpp line 994: > 992: // Cannot have assertion since this object may have been > 993: // locked by another thread when reaching here. > 994: // assert(mark.is_neutral(), "sanity check"); Hmmm... why delete this comment block? It's there to document the racy nature of this function... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178078300 From dcubed at openjdk.org Wed Apr 26 16:47:00 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 26 Apr 2023 16:47:00 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v60] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 19:42:35 GMT, Roman Kennke wrote: >> Hi there, >> what is needed to bring this PR over the approval line? > >> @rkennke - I'm planning to do another crawl thru review next week. > > Thanks! That is greatly appeciated! @rkennke - finished my second crawl thru review of 60/68 files changed. I only skipped the RISC-V files since I know nada about that platform... My Mach5 testing of v61 is running Tier7 and I hope to start Tier8 later tonight. So far all testing looks good, but I'll include the usual summary comment in the bug report... ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1523702754 From vlivanov at openjdk.org Wed Apr 26 17:16:52 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Apr 2023 17:16:52 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 18:00:39 GMT, Ashutosh Mehra wrote: > This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. The constants are declared in `src/hotspot/share/utilities/accessFlags.hpp` and I see corresponding accessors being used on `InstanceKlass`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1523761656 From cslucas at openjdk.org Wed Apr 26 17:28:53 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 26 Apr 2023 17:28:53 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v11] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <6I1KVkFSekhMTTDq6nXQNoKPE96bycERRtsPrTnZZvU=.c1933f7f-e659-4e22-93a3-e7fbbcdf53a1@github.com> > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address part of PR review 4 & fix a bug setting only_candidate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/329d9f40..78435065 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=09-10 Stats: 72 lines in 8 files changed: 15 ins; 17 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From vlivanov at openjdk.org Wed Apr 26 17:53:25 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Apr 2023 17:53:25 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: <5HmQ644vtSgf7NylFeEsQXfU8DC9W-zk3ayYq373LF4=.8fba7762-07b2-45b8-a002-1ad2c0c05b0e@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <8OlLs3nmBxAKP_OcaZPhC3g1lNpfXJQ6zYCx2XNB43A=.055edcf3-96b7-4f97-9153-fdfe26bb0c0b@github.com> <5HmQ644vtSgf7NylFeEsQXfU8DC9W-zk3ayYq373LF4=.8fba7762-07b2-45b8-a002-1ad2c0c05b0e@github.com> Message-ID: On Tue, 25 Apr 2023 21:37:11 GMT, Cesar Soares Lucas wrote: > ObjectValue is not just a candidate. I.e., the ObjectValue is also used independently of the merge. And now I'm wondering how it all plays with `is_only_merge_candidate()()`/`set_merge_candidate()`... Is it possible for an `ObjectValue` to be shared between multiple merges? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1178207147 From kvn at openjdk.org Wed Apr 26 17:54:23 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Apr 2023 17:54:23 GMT Subject: RFR: JDK-8299229: Allow UseZGC with JVMCI and enable nmethod entry barrier support [v5] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 16:50:59 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - Replace NULL with nullptr in new code > - Merge branch 'master' into tkr-zgc > - Review fixes > - ... and 1 more: https://git.openjdk.org/jdk/compare/62acc882...c7bb4391 Please, run usual testing before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11996#issuecomment-1523806761 From cjplummer at openjdk.org Wed Apr 26 18:03:04 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 26 Apr 2023 18:03:04 GMT Subject: RFR: 8306851: Move Method access flags In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 19:09:23 GMT, Coleen Phillimore wrote: > This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. > > This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. > > Tested with tier1-6, and some manual verification of printing. The SA changes look good. I think these changes make @iklam's #13663 fix unnecessary, but harmless. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13654#pullrequestreview-1402521002 From stuefe at openjdk.org Wed Apr 26 18:39:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Apr 2023 18:39:23 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 09:42:52 GMT, Richard Reingruber wrote: > With this pr I would like to wrap the body of the macro `offset_of` as defined for gcc in parentheses to fix the described issues with Eclipse CDT. > As an alternative the [`(int)` cast in `byte_offset_of`](https://github.com/openjdk/jdk/blob/44d9f55d0b3c469988be6f1c47f0cfbc433c4490/src/hotspot/share/utilities/sizes.hpp#L59) could be removed. I preferred adding the parentheses to the gcc version of `offset_of` because the impact is smaller. > > Of course I'd rather have a local solution for the issue but I couldn't find one that didn't require changes in the source code. > > Testing: GHA Looks good. Thanks for fixing this, I had noticed this in CDT myself. I never looked closely at this macro though. I wish we could just use offsetof. Also I wonder whether the ATTRIBUTE_ALIGNED(16) would be more correctly `alignas(klass)` but that is pre-existing. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13668#pullrequestreview-1402562083 From cslucas at openjdk.org Wed Apr 26 18:52:00 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 26 Apr 2023 18:52:00 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <8OlLs3nmBxAKP_OcaZPhC3g1lNpfXJQ6zYCx2XNB43A=.055edcf3-96b7-4f97-9153-fdfe26bb0c0b@github.com> <5HmQ644vtSgf7NylFeEsQXfU8DC9W-zk3ayYq373LF4=.8fba7762-07b2-45b8-a002-1ad2c0c05b0e@github.com> Message-ID: <06yEWgFGq-XvgFp61UllqFWWpkYSY9fMqQRmkaNwi7Y=.2316f990-2ba1-4fa0-848e-b7e84fc744f7@github.com> On Wed, 26 Apr 2023 17:42:23 GMT, Vladimir Ivanov wrote: > Is it possible for an ObjectValue to be shared between multiple merges? When I posted my previous comment I thought that could happen. But now I realize that in the current implementation that won't happen: an ObjectValue is created for a combination of Phi x SafePointNode. However, one situation would _require_ sharing the ObjectValue in multiple merges: when different merges share at least one SR input are used as debug info _in the same_ SafePointNode. It's required because in the same SafePointNode all ObjectValues coming from same Allocate needs to have the same value. I think the example below will trigger that - I'll check and patch the current implementation to not RAM in that case. Point p = new Point(); Point q = new Point(); Point r = new Point(); if (cond_one) p = q; if (cond_two) r = q; trap(p, q, r); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1178259628 From rkennke at openjdk.org Wed Apr 26 19:04:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Apr 2023 19:04:28 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v60] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 19:42:35 GMT, Roman Kennke wrote: >> Hi there, >> what is needed to bring this PR over the approval line? > >> @rkennke - I'm planning to do another crawl thru review next week. > > Thanks! That is greatly appeciated! > @rkennke - finished my second crawl thru review of 60/68 files changed. I only skipped the RISC-V files since I know nada about that platform... > > My Mach5 testing of v61 is running Tier7 and I hope to start Tier8 later tonight. So far all testing looks good, but I'll include the usual summary comment in the bug report... Thanks so much for reviewing this large PR (so many times)! I believe I have incorporated all your suggestions (or left a comment/question when it wasn't clear). Cheers, Roman ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1523901201 From rkennke at openjdk.org Wed Apr 26 19:04:32 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Apr 2023 19:04:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 22:51:10 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary comments > > src/hotspot/cpu/arm/macroAssembler_arm.cpp line 1803: > >> 1801: #ifdef ASSERT >> 1802: // Poison scratch regs >> 1803: POISON_REGS((~savemask), t1, t2, t3, 0x10000001); > > Should this poison value be: 0x20000002 Why? > src/hotspot/share/logging/logTag.hpp line 80: > >> 78: LOG_TAG(exceptions) \ >> 79: LOG_TAG(exit) \ >> 80: LOG_TAG(fastlock2) \ > > So why 'fastlock2'? Where's 'fastlock1'? Or 'fastlock'? It's currently only used in the arm port. I'm changing it to 'fastlock', ok? > src/hotspot/share/runtime/arguments.cpp line 1994: > >> 1992: if (UseHeavyMonitors) { >> 1993: FLAG_SET_CMDLINE(LockingMode, LM_MONITOR); >> 1994: } > > HotSpot option processing has a general rule of last setting wins. > With L1992-1994 here, I think there might be a problem with a cmd > line that specifies: > > -XX:+UseHeavyMonitors -XX:LockingMode=1 > > I think that the resulting value of `LockingMode` will be `LM_MONITOR` > instead of `LM_LEGACY`. Granted mixing uses of `UseHeavyMonitors` > with `LockingMode` is just asking for trouble... I added a check that rejects conflicting +UseHeavyMonitors and LockingMode=X flags on the cmd line. UseHeavyMonitors is already deprecated, and with the new LockingMode flag should be removed asap (in a follow-up PR). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178244295 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178253708 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178260893 From stuefe at openjdk.org Wed Apr 26 19:04:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Apr 2023 19:04:33 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: Message-ID: <1M7ql10BxRIBDp38CNNb_D0i6CE4O-97lGFO7iDRaFI=.570de882-cda2-4582-a4eb-c49cbc38aff6@github.com> On Mon, 24 Apr 2023 22:51:10 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary comments > > src/hotspot/cpu/arm/macroAssembler_arm.cpp line 1803: > >> 1801: #ifdef ASSERT >> 1802: // Poison scratch regs >> 1803: POISON_REGS((~savemask), t1, t2, t3, 0x10000001); > > Should this poison value be: 0x20000002 The poison values were something I used during development of the arm part, bug hunting wise. Though I think they make sense in general. I agree with @dcubed-ojdk, 0x2000002 would be the most logical value here. Either that or remove the poisening (though it had been useful). > src/hotspot/cpu/arm/sharedRuntime_arm.cpp line 649: > >> 647: >> 648: __ flush(); >> 649: return AdapterHandlerLibrary::new_entry(fingerprint, i2c_entry, c2i_entry, c2i_unverified_entry); > > This change seems out of place... what's the story here? This is a local revert of *8303154: Investigate and improve instruction cache flushing during compilation* - the missing flush caused random crashes, but I did not have time to investigate. I reverted the flush, crashes were gone. If needed I may revisit this when there is time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178250178 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178255706 From coleenp at openjdk.org Wed Apr 26 19:56:54 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Apr 2023 19:56:54 GMT Subject: RFR: 8306851: Move Method access flags In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 19:09:23 GMT, Coleen Phillimore wrote: > This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. > > This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. > > Tested with tier1-6, and some manual verification of printing. Thanks Chris. We were wondering what to do with JVM_RECOGNIZED_METHOD_MODIFIERS but we'll clean them up in another pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13654#issuecomment-1523947642 From shade at openjdk.org Wed Apr 26 20:22:54 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Apr 2023 20:22:54 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v12] In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'master' into JDK-83050920-thread-sleep-subms - Merge branch 'master' into JDK-83050920-thread-sleep-subms - Fix Amazon copyright - Merge branch 'master' into JDK-83050920-thread-sleep-subms - Drop nanos_to_nanos_bounded - Handle overflows - More review comments - Adjust test times - Windows again - Windows fixes: align(...) is only for power-of-two alignments - ... and 16 more: https://git.openjdk.org/jdk/compare/35e7bc21...da8f0f8c ------------- Changes: https://git.openjdk.org/jdk/pull/13225/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13225&range=11 Stats: 254 lines in 11 files changed: 226 ins; 9 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/13225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13225/head:pull/13225 PR: https://git.openjdk.org/jdk/pull/13225 From dcubed at openjdk.org Wed Apr 26 21:08:19 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 26 Apr 2023 21:08:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: <1M7ql10BxRIBDp38CNNb_D0i6CE4O-97lGFO7iDRaFI=.570de882-cda2-4582-a4eb-c49cbc38aff6@github.com> References: <1M7ql10BxRIBDp38CNNb_D0i6CE4O-97lGFO7iDRaFI=.570de882-cda2-4582-a4eb-c49cbc38aff6@github.com> Message-ID: On Wed, 26 Apr 2023 18:29:36 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/arm/macroAssembler_arm.cpp line 1803: >> >>> 1801: #ifdef ASSERT >>> 1802: // Poison scratch regs >>> 1803: POISON_REGS((~savemask), t1, t2, t3, 0x10000001); >> >> Should this poison value be: 0x20000002 > > The poison values were something I used during development of the arm part, bug hunting wise. Though I think they make sense in general. I agree with @dcubed-ojdk, 0x2000002 would be the most logical value here. Either that or remove the poisening (though it had been useful). 0x20000002 because 0x10000001 was used earlier and it would be good to keep the poisoning values unique. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178390683 From dcubed at openjdk.org Wed Apr 26 21:08:21 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 26 Apr 2023 21:08:21 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: <9aK0kPPAYJYmaffnYZ47rIkDC8El1snix1GhTeUTBnE=.54c3e35d-b285-4c04-9e3f-ab2751902724@github.com> <_Y6eLacV_ecmijvlzo2lGe-U5n6ZtaJnUA6KL9BsJJw=.a66f23b0-aaf6-43ac-a210-ad830a1e744c@github.com> Message-ID: On Wed, 26 Apr 2023 11:00:33 GMT, Quan Anh Mai wrote: >> How would I check if we are emitting code? >> >> I am not sure I understand. The check for ANONYMOUS is only relevant when we observe an already-inflated monitor. I think this is the right place to put it. > > The entry barrier does this: > > https://github.com/openjdk/jdk/blob/86f41a4c42268d364175263804eb4d1ce82fa943/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L139 > > `testptr(tmpReg, markWord::monitor_value)` is checking for inflation, and the following `if` block acts when inflation is detected, what I mean is to move the whole enclosed if down out of the `if (LockingMode != LM_MONITOR)` It took a couple of re-reads to figure this out. You've added a scratch emit size check before generating the C2HandleAnonOMOwnerStub. For some reason, the way that GitHub shows those changes really confused my brain... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178116470 From dcubed at openjdk.org Wed Apr 26 21:08:23 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 26 Apr 2023 21:08:23 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 18:33:28 GMT, Roman Kennke wrote: >> src/hotspot/share/logging/logTag.hpp line 80: >> >>> 78: LOG_TAG(exceptions) \ >>> 79: LOG_TAG(exit) \ >>> 80: LOG_TAG(fastlock2) \ >> >> So why 'fastlock2'? Where's 'fastlock1'? Or 'fastlock'? > > It's currently only used in the arm port. I'm changing it to 'fastlock', ok? Yup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1178391272 From kbarrett at openjdk.org Wed Apr 26 22:00:52 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Apr 2023 22:00:52 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 09:42:52 GMT, Richard Reingruber wrote: > With this pr I would like to wrap the body of the macro `offset_of` as defined for gcc in parentheses to fix the described issues with Eclipse CDT. > As an alternative the [`(int)` cast in `byte_offset_of`](https://github.com/openjdk/jdk/blob/44d9f55d0b3c469988be6f1c47f0cfbc433c4490/src/hotspot/share/utilities/sizes.hpp#L59) could be removed. I preferred adding the parentheses to the gcc version of `offset_of` because the impact is smaller. > > Of course I'd rather have a local solution for the issue but I couldn't find one that didn't require changes in the source code. > > Testing: GHA Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13668#pullrequestreview-1402863349 From dholmes at openjdk.org Thu Apr 27 01:56:54 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Apr 2023 01:56:54 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v12] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 26 Apr 2023 20:22:54 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge branch 'master' into JDK-83050920-thread-sleep-subms > - Merge branch 'master' into JDK-83050920-thread-sleep-subms > - Fix Amazon copyright > - Merge branch 'master' into JDK-83050920-thread-sleep-subms > - Drop nanos_to_nanos_bounded > - Handle overflows > - More review comments > - Adjust test times > - Windows again > - Windows fixes: align(...) is only for power-of-two alignments > - ... and 16 more: https://git.openjdk.org/jdk/compare/35e7bc21...da8f0f8c Nothing further from me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13225#pullrequestreview-1403054308 From sspitsyn at openjdk.org Thu Apr 27 03:46:29 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 27 Apr 2023 03:46:29 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v4] In-Reply-To: References: Message-ID: > This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. > > Testing: mach5 tiers 1-6 were successful. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into br29 - do more refactoring including VirtualThread class - Merge - 8304444: Reappearance of NULL in jvmtiThreadState.cpp - 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions ------------- Changes: https://git.openjdk.org/jdk/pull/13484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=03 Stats: 331 lines in 16 files changed: 182 ins; 71 del; 78 mod Patch: https://git.openjdk.org/jdk/pull/13484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13484/head:pull/13484 PR: https://git.openjdk.org/jdk/pull/13484 From sspitsyn at openjdk.org Thu Apr 27 03:55:54 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 27 Apr 2023 03:55:54 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v3] In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 22:02:20 GMT, Serguei Spitsyn wrote: >> This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. >> >> Testing: mach5 tiers 1-6 were successful. > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge > - 8304444: Reappearance of NULL in jvmtiThreadState.cpp > - 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions Added update with refactoring prepared by @pchilano . This update includes some renaming to make function names more consistent. The mach5 runs of tiers 1-6 are all passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13484#issuecomment-1524617260 From sspitsyn at openjdk.org Thu Apr 27 04:52:53 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 27 Apr 2023 04:52:53 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v5] In-Reply-To: References: Message-ID: > This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. > > Testing: mach5 tiers 1-6 were successful. Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'br29' of https://github.com/sspitsyn/jdk into br29 merge with branch29 - move code a little bit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13484/files - new: https://git.openjdk.org/jdk/pull/13484/files/639b2110..debe49c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=03-04 Stats: 52 lines in 2 files changed: 24 ins; 24 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13484/head:pull/13484 PR: https://git.openjdk.org/jdk/pull/13484 From dholmes at openjdk.org Thu Apr 27 05:00:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Apr 2023 05:00:53 GMT Subject: RFR: 8306851: Move Method access flags In-Reply-To: References: Message-ID: <4MFJ-FafLsw4MTThURBlLZAvFZauwOd0VPklgivnehU=.4a24c95f-8c5f-416b-a63d-c21d58501dae@github.com> On Tue, 25 Apr 2023 19:09:23 GMT, Coleen Phillimore wrote: > This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. > > This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. > > Tested with tier1-6, and some manual verification of printing. General idea is good but I have some issues with naming inconsistencies. A few other queries. Thanks. src/hotspot/share/classfile/classFileParser.cpp line 2741: > 2739: parsed_annotations.apply_to(methodHandle(THREAD, m)); > 2740: > 2741: if (is_hidden() && !m->is_hidden()) { // Mark methods in hidden classes as 'hidden'. This seems odd - how would m already be marked hidden? And why do we care? The check-and-branch is more expensive than just setting the field. src/hotspot/share/oops/constMethodFlags.hpp line 34: > 32: > 33: // The ConstMethodFlags class contains the parse-time flags associated with > 34: // an Method, and their associated accessors. s/an/a/ s/their/its/ src/hotspot/share/oops/constMethodFlags.hpp line 53: > 51: flag(has_type_annotations , 1 << 9) \ > 52: flag(has_default_annotations , 1 << 10) \ > 53: flag(caller_sensitive , 1 << 11) \ Nit: we should consistently use either `x` or `is_x` for `x` in `overpass, caller_sensitive, hidden, scoped, ...` src/hotspot/share/oops/method.cpp line 735: > 733: case Bytecodes::_jsr: > 734: if (bcs.dest() < bcs.next_bci()) { > 735: return set_has_loops(); I don't understand the new return logic here. The break gets us out of the switch but we are still in the while loop, but the return takes us all the way out. ??? src/hotspot/share/oops/method.hpp line 808: > 806: > 807: bool is_hidden() const { return constMethod()->is_hidden(); } > 808: void set_hidden() { constMethod()->set_is_hidden(); } The naming is inconsistent here regarding `is` - either both should have it or neither IMO. src/hotspot/share/oops/method.hpp line 871: > 869: void clear_not_c2_compilable() { set_not_c2_compilable(false); } > 870: > 871: bool is_not_c1_osr_compilable() const { return not_c1_compilable(); } // don't waste a flags bit Again inconsistent naming with `is` but also `osr` in this case. I would expect being compilable to be quite different to being "osr compilable". ------------- PR Review: https://git.openjdk.org/jdk/pull/13654#pullrequestreview-1403149129 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1178611573 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1178613648 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1178614433 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1178618364 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1178619597 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1178620558 From dholmes at openjdk.org Thu Apr 27 05:01:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Apr 2023 05:01:23 GMT Subject: RFR: 8306851: Move Method access flags In-Reply-To: <4MFJ-FafLsw4MTThURBlLZAvFZauwOd0VPklgivnehU=.4a24c95f-8c5f-416b-a63d-c21d58501dae@github.com> References: <4MFJ-FafLsw4MTThURBlLZAvFZauwOd0VPklgivnehU=.4a24c95f-8c5f-416b-a63d-c21d58501dae@github.com> Message-ID: On Thu, 27 Apr 2023 04:49:10 GMT, David Holmes wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > src/hotspot/share/oops/method.hpp line 871: > >> 869: void clear_not_c2_compilable() { set_not_c2_compilable(false); } >> 870: >> 871: bool is_not_c1_osr_compilable() const { return not_c1_compilable(); } // don't waste a flags bit > > Again inconsistent naming with `is` but also `osr` in this case. I would expect being compilable to be quite different to being "osr compilable". Ah I get this bit now - but the comment didn't make it clear. For C1 compilable and osr-compilable are considered the same. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1178623386 From dholmes at openjdk.org Thu Apr 27 05:23:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Apr 2023 05:23:23 GMT Subject: RFR: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 01:06:23 GMT, Leonid Mesnik wrote: > The ProcessTools.startProcess (...) has been updated to completely read streams after process has been completed. > The test was updated to run 5 times with different number of lines and line sizes. I missed the previous changes to ProcessTools as I was away. This code is so complicated that I find it impossible to discern if it is actually doing the right thing. But I'm concerned by the added FutureTask usage. test/lib-test/jdk/test/lib/process/ProcessToolsStartProcessTest.java line 79: > 77: System.out.print("FAILED: wrong number of lines in Consumer output\n"); > 78: success = false; > 79: System.out.print(out.getStdout()); Why isn't this printing output? test/lib-test/jdk/test/lib/process/ProcessToolsStartProcessTest.java line 95: > 93: public static void main(String[] args) throws Exception { > 94: if (args.length > 0) { > 95: for (int i = 0; i < Integer.parseInt(args[0]); i++) { This will call parseInt on each iteration of the loop. test/lib/jdk/test/lib/process/ProcessTools.java line 183: > 181: // It is needed to wait until stream is flushed after > 182: // process is completed. > 183: task.get(); This looks problematic - if you block here you are holding the object monitor as this is a synchronized method. ------------- PR Review: https://git.openjdk.org/jdk/pull/13683#pullrequestreview-1403178899 PR Review Comment: https://git.openjdk.org/jdk/pull/13683#discussion_r1178625211 PR Review Comment: https://git.openjdk.org/jdk/pull/13683#discussion_r1178627264 PR Review Comment: https://git.openjdk.org/jdk/pull/13683#discussion_r1178629324 From dholmes at openjdk.org Thu Apr 27 06:05:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Apr 2023 06:05:23 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v2] In-Reply-To: References: Message-ID: <1z2ZOeorI_1mGoZ5NvNqutTolY7XrpA5LyZvxzkyOhU=.11e67f25-138b-462f-b966-f296bdf86c2f@github.com> On Mon, 24 Apr 2023 08:01:40 GMT, Wojciech Kudla wrote: >> As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. >> This is immensely useful for investigating time-to-safepoint issues in low latency space. > > Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: > > Fixed jlong conversion order Seems fine in principle. Please add this flag as a testcase for test/hotspot/jtreg/runtime/CommandLine/DoubleFlagWithIntegerValue.java. Change requested below. Thanks. src/hotspot/share/utilities/globalDefinitions.hpp line 170: > 168: > 169: // Format jdouble with defined precision > 170: #define JDOUBLE_FORMAT_P(precision) "%." #precision "f" This is not necessary. We only define macros when there are platform differences with format specifiers. Just use `%.6f` directly in the code. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13373#pullrequestreview-1403242504 PR Review Comment: https://git.openjdk.org/jdk/pull/13373#discussion_r1178657022 From duke at openjdk.org Thu Apr 27 06:07:52 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 27 Apr 2023 06:07:52 GMT Subject: RFR: 8305590: Remove nothrow exception specifications from operator new [v3] In-Reply-To: References: <5uZmZV3ssLzKKFBeajmWUBmZsaZeq-3ImeHkzCvnSwg=.4941a9c0-b250-4290-828a-15416ed791df@github.com> <7EfFlEYBncjzdoZzU_gdl8tB7kpj_z3ga_h-fXxy0Os=.42af4f0f-e003-4a54-bf79-603c80bdd12f@github.com> Message-ID: On Mon, 24 Apr 2023 00:23:18 GMT, Kim Barrett wrote: > I believe this may have missed removing the exception specifier from an operator new inside AnyObj, allocation.cpp, since gcc 12 and up on my end now refuses to compile HotSpot with this change. I'll create a cleanup change for this, if there isn't any opposition to that To be able to work on this, please file an issue for this case including the error and the way to reproduce it. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13498#issuecomment-1524762797 From dholmes at openjdk.org Thu Apr 27 06:29:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Apr 2023 06:29:53 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v2] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 14:26:02 GMT, Martin Doerr wrote: >> Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed jlong conversion order > > Would be helpful if you could enable the Pre-submit test (GitHub actions). @TheRealMDoerr , @w-kudla I have filed the CSR request for this change and reviewed it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13373#issuecomment-1524802135 From aboldtch at openjdk.org Thu Apr 27 07:16:23 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 27 Apr 2023 07:16:23 GMT Subject: Integrated: 8306732: TruncatedSeq::predict_next() attempts linear regression with only one data point In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 11:51:06 GMT, Axel Boldt-Christmas wrote: > TruncatedSeq::predict_next() attempts linear regression with only one data point, this leads to a division by zero. (There are infinit many linear functions that fit equally well for a single point). > > I suggest we do what we do for the zero points case, namely pick one of the linear functions. > > For zero points the current version picks `y = 0 + 0*x` and the suggestion is that for one point `P` the function `y = P_y + 0*x` is picked. > > Tested for ZGC tier1-7 on Oracle supported platforms. Only ZGC use TruncatedSeq::predict_next() This pull request has now been integrated. Changeset: 748476fd Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/748476fd80ec93c25d823bc5088c706fcf3c7e65 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod 8306732: TruncatedSeq::predict_next() attempts linear regression with only one data point Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/13614 From lmesnik at openjdk.org Thu Apr 27 07:18:23 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 27 Apr 2023 07:18:23 GMT Subject: RFR: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" [v2] In-Reply-To: References: Message-ID: > The ProcessTools.startProcess (...) has been updated to completely read streams after process has been completed. > The test was updated to run 5 times with different number of lines and line sizes. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13683/files - new: https://git.openjdk.org/jdk/pull/13683/files/c16d33f8..10f1c5c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13683&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13683&range=00-01 Stats: 9 lines in 2 files changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13683/head:pull/13683 PR: https://git.openjdk.org/jdk/pull/13683 From lmesnik at openjdk.org Thu Apr 27 07:18:28 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 27 Apr 2023 07:18:28 GMT Subject: RFR: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" [v2] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 04:59:00 GMT, David Holmes wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> fix > > test/lib-test/jdk/test/lib/process/ProcessToolsStartProcessTest.java line 79: > >> 77: System.out.print("FAILED: wrong number of lines in Consumer output\n"); >> 78: success = false; >> 79: System.out.print(out.getStdout()); > > Why isn't this printing output? fixed. > test/lib-test/jdk/test/lib/process/ProcessToolsStartProcessTest.java line 95: > >> 93: public static void main(String[] args) throws Exception { >> 94: if (args.length > 0) { >> 95: for (int i = 0; i < Integer.parseInt(args[0]); i++) { > > This will call parseInt on each iteration of the loop. fixed. > test/lib/jdk/test/lib/process/ProcessTools.java line 183: > >> 181: // It is needed to wait until stream is flushed after >> 182: // process is completed. >> 183: task.get(); > > This looks problematic - if you block here you are holding the object monitor as this is a synchronized method. You are right. It is needed to give a chance to write to this buffer. fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13683#discussion_r1178707275 PR Review Comment: https://git.openjdk.org/jdk/pull/13683#discussion_r1178707373 PR Review Comment: https://git.openjdk.org/jdk/pull/13683#discussion_r1178708269 From duke at openjdk.org Thu Apr 27 07:38:53 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Apr 2023 07:38:53 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v3] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 14:46:00 GMT, Fredrik Bredberg wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Put back assert in recurse_thaw_interpreted_frame > - Merge branch 'master' into freeze_thaw_interpreter_JDK-8300197_2023-01-19 > - Updated after review > - 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call Will you provide any additional feedback @theRealAph and @RealFYang? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13477#issuecomment-1524957061 From rrich at openjdk.org Thu Apr 27 07:50:23 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 27 Apr 2023 07:50:23 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 09:42:52 GMT, Richard Reingruber wrote: > With this pr I would like to wrap the body of the macro `offset_of` as defined for gcc in parentheses to fix the described issues with Eclipse CDT. > As an alternative the [`(int)` cast in `byte_offset_of`](https://github.com/openjdk/jdk/blob/44d9f55d0b3c469988be6f1c47f0cfbc433c4490/src/hotspot/share/utilities/sizes.hpp#L59) could be removed. I preferred adding the parentheses to the gcc version of `offset_of` because the impact is smaller. > > Of course I'd rather have a local solution for the issue but I couldn't find one that didn't require changes in the source code. > > Testing: GHA Hi Thomas, > Looks good. Thanks for fixing this, I had noticed this in CDT myself. thanks for the review! > I never looked closely at this macro though. I wish we could just use offsetof. Indeed. > Also I wonder whether the ATTRIBUTE_ALIGNED(16) would be more correctly `alignas(klass)` but that is pre-existing. Probably :) At least it would be cleaner in my eyes than the hard coded 16. With this pr I'd like to address just the CDT issue though. Thanks, Richard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13668#issuecomment-1524999747 From rkennke at openjdk.org Thu Apr 27 07:52:23 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 27 Apr 2023 07:52:23 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v64] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Suggestios by @dcubed-ojdk ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/5f927f9c..1323e958 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=63 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=62-63 Stats: 61 lines in 21 files changed: 11 ins; 10 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Apr 27 08:01:02 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Apr 2023 08:01:02 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 07:42:02 GMT, Richard Reingruber wrote: > Hi Thomas, > > > Looks good. Thanks for fixing this, I had noticed this in CDT myself. > > thanks for the review! > > > I never looked closely at this macro though. I wish we could just use offsetof. > > Indeed. > > > Also I wonder whether the ATTRIBUTE_ALIGNED(16) would be more correctly `alignas(klass)` but that is pre-existing. > > Probably :) At least it would be cleaner in my eyes than the hard coded 16. With this pr I'd like to address just the CDT issue though. > I believe the trick may not work if the type were artificially aligned to something larger than16, e.g. struct XX alignas(64). I guess that depends on what the compiler does when casting a pointer with a smaller alignment to something with a larger alignemnt. > Thanks, Richard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13668#issuecomment-1525024724 From rrich at openjdk.org Thu Apr 27 08:58:28 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 27 Apr 2023 08:58:28 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 07:53:01 GMT, Thomas Stuefe wrote: > > > Also I wonder whether the ATTRIBUTE_ALIGNED(16) would be more correctly `alignas(klass)` but that is pre-existing. > > > > > > Probably :) At least it would be cleaner in my eyes than the hard coded 16. With this pr I'd like to address just the CDT issue though. > > I believe the trick may not work if the type were artificially aligned to something larger than16, e.g. struct XX alignas(64). I guess that depends on what the compiler does when casting a pointer with a smaller alignment to something with a larger alignemnt. Maybe. I'm not familiar enough with `alignas` right now. Also I would have to 'meditate' a bit about [the alignas-part in the hotspot-style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#alignas). Right now I wouldn't expect extended alignment of a type to affect member offsets. But that might be naive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13668#issuecomment-1525140988 From pminborg at openjdk.org Thu Apr 27 09:13:33 2023 From: pminborg at openjdk.org (Per Minborg) Date: Thu, 27 Apr 2023 09:13:33 GMT Subject: Integrated: 8304265: Implementation of Foreign Function and Memory API (Third Preview) In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:42:56 GMT, Per Minborg wrote: > API changes for the FFM API (third preview) > > ### Specdiff > https://cr.openjdk.org/~pminborg/panama/21/v2/specdiff/overview-summary.html > > ### Javadoc > https://cr.openjdk.org/~pminborg/panama/21/v2/javadoc/api/java.base/java/lang/foreign/package-summary.html > > ### Tests > > Testing excludes tests on the "zero" platform. > > - [X] Tier1 > - [X] Tier2 > - [X] Tier3 > - [X] Tier4 > - [X] Tier5 > - [X] Tier6 (Except one test applications/jcstress/init.java as per below) > > > Exception in thread "main" java.lang.IllegalStateException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:66) > at org.openjdk.jcstress.vm.ContendedTestMain.main(ContendedTestMain.java:43) > Caused by: java.lang.ClassNotFoundException: /opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S63432/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/b895db82-df34-4e71-9b59-8e23465e55c1/runs/777d38e3-4dd3-42b6-95ca-97f1167b417c/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part3/classes/2/applications/jcstress/causality/d/applications/jcstress/JcstressRunner > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:497) > at java.base/java.lang.Class.forName(Class.java:476) > at org.openjdk.jcstress.util.Reflections.getClasses(Reflections.java:64) > ... 1 more This pull request has now been integrated. Changeset: cbccc4c8 Author: Per Minborg URL: https://git.openjdk.org/jdk/commit/cbccc4c8172797ea2f1b7c301d00add3f517546d Stats: 13420 lines in 270 files changed: 5100 ins; 6182 del; 2138 mod 8304265: Implementation of Foreign Function and Memory API (Third Preview) Co-authored-by: Maurizio Cimadamore Co-authored-by: Jorn Vernee Co-authored-by: Paul Sandoz Co-authored-by: Feilong Jiang Co-authored-by: Per Minborg Reviewed-by: erikj, jvernee, vlivanov, psandoz ------------- PR: https://git.openjdk.org/jdk/pull/13079 From sspitsyn at openjdk.org Thu Apr 27 09:14:29 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 27 Apr 2023 09:14:29 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: install_async_exception: set interrupt status for platform threads only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/956e8ee8..0113f034 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=04-05 Stats: 14 lines in 2 files changed: 9 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From fyang at openjdk.org Thu Apr 27 09:36:54 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 27 Apr 2023 09:36:54 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v3] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 14:46:00 GMT, Fredrik Bredberg wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Put back assert in recurse_thaw_interpreted_frame > - Merge branch 'master' into freeze_thaw_interpreter_JDK-8300197_2023-01-19 > - Updated after review > - 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call Looks good to me. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13477#pullrequestreview-1403611621 From shade at openjdk.org Thu Apr 27 09:50:26 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Apr 2023 09:50:26 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v12] In-Reply-To: References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: <8Ak5D6_aeb2o7uOQKF3TZMQsgcA-gCDniHnI-7ZWnMs=.371ccce9-902e-4a03-a7c7-efe4907693fe@github.com> On Wed, 26 Apr 2023 20:22:54 GMT, Aleksey Shipilev wrote: >> Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. >> >> When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. >> >> Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. >> >> Additional testing: >> - [x] New regression test >> - [x] New benchmark >> - [x] Linux x86_64 `tier1` >> - [x] Linux AArch64 `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge branch 'master' into JDK-83050920-thread-sleep-subms > - Merge branch 'master' into JDK-83050920-thread-sleep-subms > - Fix Amazon copyright > - Merge branch 'master' into JDK-83050920-thread-sleep-subms > - Drop nanos_to_nanos_bounded > - Handle overflows > - More review comments > - Adjust test times > - Windows again > - Windows fixes: align(...) is only for power-of-two alignments > - ... and 16 more: https://git.openjdk.org/jdk/compare/35e7bc21...da8f0f8c All right, thank you all! I plan to integrate this some time today/tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1525286399 From stuefe at openjdk.org Thu Apr 27 10:10:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Apr 2023 10:10:23 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 08:41:35 GMT, Richard Reingruber wrote: > > > > Also I wonder whether the ATTRIBUTE_ALIGNED(16) would be more correctly `alignas(klass)` but that is pre-existing. > > > > > > > > > Probably :) At least it would be cleaner in my eyes than the hard coded 16. With this pr I'd like to address just the CDT issue though. > > > > > > I believe the trick may not work if the type were artificially aligned to something larger than16, e.g. struct XX alignas(64). I guess that depends on what the compiler does when casting a pointer with a smaller alignment to something with a larger alignemnt. > > Maybe. I'm not familiar enough with `alignas` right now. Also I would have to 'meditate' a bit about [the alignas-part in the hotspot-style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#alignas). Right now I wouldn't expect extended alignment of a type to affect member offsets. But that might be naive. Checked with GCC and clang, it seems to work. But I think its still UB (otherwise we could just omit the ALIGN specifier completely). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13668#issuecomment-1525338929 From alanb at openjdk.org Thu Apr 27 11:09:53 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 27 Apr 2023 11:09:53 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: References: Message-ID: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> On Thu, 27 Apr 2023 09:14:29 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > install_async_exception: set interrupt status for platform threads only src/hotspot/share/prims/jvmti.xml line 11984: > 11982: > 11983: Information about the frame is not available (e.g. for native frames), > 11984: or the frame is not suitable for the requested operation. After re-reading the spec changes, I'm wondering if we can improve on "or the frame is not suitable for the requested operation". StopThread doesn't have a frame parameter. ForceEarlyReturn doesn't have a frame parameter either as it's implicit (the current frame). I wonder if wording something like this might be better: "or a function on a thread cannot be performed at the thread's current frame". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1178984510 From mdoerr at openjdk.org Thu Apr 27 12:13:29 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Apr 2023 12:13:29 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v23] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: - Enable remaining foreign tests. - Adaptations for JDK-8304265. - Merge remote-tracking branch 'origin' into PPC64_Panama - Adaptation for JDK-8305668 - Merge remote-tracking branch 'origin' into PPC64_Panama - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. - Adaptation for JDK-8303022. - Adaptation for JDK-8303684. - Merge branch 'openjdk:master' into PPC64_Panama - Merge branch 'master' into PPC64_Panama - ... and 17 more: https://git.openjdk.org/jdk/compare/1be80a44...84e9dd2f ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=22 Stats: 2556 lines in 70 files changed: 2393 ins; 6 del; 157 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From coleenp at openjdk.org Thu Apr 27 12:13:56 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Apr 2023 12:13:56 GMT Subject: RFR: 8306851: Move Method access flags In-Reply-To: <4MFJ-FafLsw4MTThURBlLZAvFZauwOd0VPklgivnehU=.4a24c95f-8c5f-416b-a63d-c21d58501dae@github.com> References: <4MFJ-FafLsw4MTThURBlLZAvFZauwOd0VPklgivnehU=.4a24c95f-8c5f-416b-a63d-c21d58501dae@github.com> Message-ID: On Thu, 27 Apr 2023 04:28:31 GMT, David Holmes wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > src/hotspot/share/classfile/classFileParser.cpp line 2741: > >> 2739: parsed_annotations.apply_to(methodHandle(THREAD, m)); >> 2740: >> 2741: if (is_hidden() && !m->is_hidden()) { // Mark methods in hidden classes as 'hidden'. > > This seems odd - how would m already be marked hidden? And why do we care? The check-and-branch is more expensive than just setting the field. This hidden flag is set by both the Method attribute during class file parsing and set to true if the whole class is hidden. My original version of the assert_is_safe() function only allowed a flag to be set once, but I've changed it to only set to true. So I can undo this. > src/hotspot/share/oops/method.cpp line 735: > >> 733: case Bytecodes::_jsr: >> 734: if (bcs.dest() < bcs.next_bci()) { >> 735: return set_has_loops(); > > I don't understand the new return logic here. The break gets us out of the switch but we are still in the while loop, but the return takes us all the way out. ??? Yes, I had a version of this change where we only let has_loops get set once, and this code set it over and over again, which is unnecessary. Once it's true, it stays true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1179049959 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1179044448 From mdoerr at openjdk.org Thu Apr 27 12:54:55 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Apr 2023 12:54:55 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Revert unintended formatting changes. Fix comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/84e9dd2f..7b85fca9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=22-23 Stats: 189 lines in 6 files changed: 5 ins; 51 del; 133 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From coleenp at openjdk.org Thu Apr 27 12:57:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Apr 2023 12:57:23 GMT Subject: RFR: 8306851: Move Method access flags [v2] In-Reply-To: References: Message-ID: > This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. > > This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. > > Tested with tier1-6, and some manual verification of printing. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add is prefixes and some cleanups. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13654/files - new: https://git.openjdk.org/jdk/pull/13654/files/4b92aacf..fc5bcaa6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=00-01 Stats: 37 lines in 8 files changed: 1 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/13654.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13654/head:pull/13654 PR: https://git.openjdk.org/jdk/pull/13654 From coleenp at openjdk.org Thu Apr 27 13:00:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Apr 2023 13:00:23 GMT Subject: RFR: 8306851: Move Method access flags [v2] In-Reply-To: <4MFJ-FafLsw4MTThURBlLZAvFZauwOd0VPklgivnehU=.4a24c95f-8c5f-416b-a63d-c21d58501dae@github.com> References: <4MFJ-FafLsw4MTThURBlLZAvFZauwOd0VPklgivnehU=.4a24c95f-8c5f-416b-a63d-c21d58501dae@github.com> Message-ID: On Thu, 27 Apr 2023 04:35:10 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add is prefixes and some cleanups. > > src/hotspot/share/oops/constMethodFlags.hpp line 53: > >> 51: flag(has_type_annotations , 1 << 9) \ >> 52: flag(has_default_annotations , 1 << 10) \ >> 53: flag(caller_sensitive , 1 << 11) \ > > Nit: we should consistently use either `x` or `is_x` for `x` in `overpass, caller_sensitive, hidden, scoped, ...` That's my preference too, but I was trying to not change all the callers to is_caller_sensitive, for example, or providing a set of wrappers to change the "x" calls to "is_x" calls to the flags interface. > src/hotspot/share/oops/method.hpp line 808: > >> 806: >> 807: bool is_hidden() const { return constMethod()->is_hidden(); } >> 808: void set_hidden() { constMethod()->set_is_hidden(); } > > The naming is inconsistent here regarding `is` - either both should have it or neither IMO. This one is easy to fix because there aren't calls everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1179054655 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1179057923 From coleenp at openjdk.org Thu Apr 27 13:00:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Apr 2023 13:00:55 GMT Subject: RFR: 8306851: Move Method access flags [v2] In-Reply-To: References: <4MFJ-FafLsw4MTThURBlLZAvFZauwOd0VPklgivnehU=.4a24c95f-8c5f-416b-a63d-c21d58501dae@github.com> Message-ID: <8hc_LuISgfyGtsyx-_GTeuhTQ5xyu2JLM5heYgREKIA=.a5acf58c-16bd-4a3e-b4db-aca3021e5c73@github.com> On Thu, 27 Apr 2023 12:09:21 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/constMethodFlags.hpp line 53: >> >>> 51: flag(has_type_annotations , 1 << 9) \ >>> 52: flag(has_default_annotations , 1 << 10) \ >>> 53: flag(caller_sensitive , 1 << 11) \ >> >> Nit: we should consistently use either `x` or `is_x` for `x` in `overpass, caller_sensitive, hidden, scoped, ...` > > That's my preference too, but I was trying to not change all the callers to is_caller_sensitive, for example, or providing a set of wrappers to change the "x" calls to "is_x" calls to the flags interface. The 'is' in not_x_compilable can be added without too much fanout of lines and files that I don't want to change, so I added the 'is' to these. >> src/hotspot/share/oops/method.cpp line 735: >> >>> 733: case Bytecodes::_jsr: >>> 734: if (bcs.dest() < bcs.next_bci()) { >>> 735: return set_has_loops(); >> >> I don't understand the new return logic here. The break gets us out of the switch but we are still in the while loop, but the return takes us all the way out. ??? > > Yes, I had a version of this change where we only let has_loops get set once, and this code set it over and over again, which is unnecessary. Once it's true, it stays true. The early break and not resetting has_loops once it's true, saves the atomic access also. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1179084771 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1179080249 From stuefe at openjdk.org Thu Apr 27 13:13:26 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Apr 2023 13:13:26 GMT Subject: RFR: JDK-8305387: JDK-8301995 breaks arm 32-bit [v4] In-Reply-To: <0t8Fp7BDksOpJK4c2fU93aMSvGS8SthjDFk1LM4BFi0=.4614b9e8-407f-4071-ba68-29c18f2b929a@github.com> References: <0t8Fp7BDksOpJK4c2fU93aMSvGS8SthjDFk1LM4BFi0=.4614b9e8-407f-4071-ba68-29c18f2b929a@github.com> Message-ID: On Tue, 25 Apr 2023 09:38:14 GMT, Aleksei Voitylov wrote: >> Provides missing implementation for arm32. >> >> Testing: hotspot/jtreg. > > Aleksei Voitylov has updated the pull request incrementally with one additional commit since the last revision: > > address Matias comments @voitylov Thanks a lot for fixing this! I gave it a cursory look, it looks good to me as well. Also tested it on my Arm box and it works. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13596#issuecomment-1525661861 From mdoerr at openjdk.org Thu Apr 27 13:16:22 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Apr 2023 13:16:22 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 12:54:55 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Revert unintended formatting changes. Fix comment. Adapted for JDK21, now. All tests have passed. My IDE had changed the formatting which is reverted, now. (I've kept the minor formatting changes in TestDontRelease.java because it looks better.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1525655158 From coleenp at openjdk.org Thu Apr 27 14:21:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Apr 2023 14:21:23 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: > This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. > > This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. > > Tested with tier1-6, and some manual verification of printing. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove bool argument from ConstMethodFlags.set function. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13654/files - new: https://git.openjdk.org/jdk/pull/13654/files/fc5bcaa6..6687cc0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=01-02 Stats: 9 lines in 2 files changed: 0 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13654.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13654/head:pull/13654 PR: https://git.openjdk.org/jdk/pull/13654 From coleenp at openjdk.org Thu Apr 27 16:33:22 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Apr 2023 16:33:22 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 14:21:23 GMT, Coleen Phillimore wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove bool argument from ConstMethodFlags.set function. @dougxc can you look at the JVMCI changes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13654#issuecomment-1525977245 From lmesnik at openjdk.org Thu Apr 27 16:35:59 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 27 Apr 2023 16:35:59 GMT Subject: RFR: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" [v3] In-Reply-To: References: Message-ID: > The ProcessTools.startProcess (...) has been updated to completely read streams after process has been completed. > The test was updated to run 5 times with different number of lines and line sizes. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: move buffers registration before pumping start point ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13683/files - new: https://git.openjdk.org/jdk/pull/13683/files/10f1c5c2..d02b889a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13683&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13683&range=01-02 Stats: 28 lines in 1 file changed: 10 ins; 9 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/13683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13683/head:pull/13683 PR: https://git.openjdk.org/jdk/pull/13683 From dcubed at openjdk.org Thu Apr 27 17:05:59 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 27 Apr 2023 17:05:59 GMT Subject: RFR: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" [v3] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 16:35:59 GMT, Leonid Mesnik wrote: >> The ProcessTools.startProcess (...) has been updated to completely read streams after process has been completed. >> The test was updated to run 5 times with different number of lines and line sizes. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > move buffers registration before pumping start point What Mach5 testing has been done with this fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13683#issuecomment-1526034233 From lmesnik at openjdk.org Thu Apr 27 17:31:23 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 27 Apr 2023 17:31:23 GMT Subject: RFR: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" [v3] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 16:35:59 GMT, Leonid Mesnik wrote: >> The ProcessTools.startProcess (...) has been updated to completely read streams after process has been completed. >> The test was updated to run 5 times with different number of lines and line sizes. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > move buffers registration before pumping start point The testing: - tier1 - tier5 to verify that nothing is brokens, - run jdk/test/lib/process/ProcessToolsStartProcessTest.java with 50 iterations instead of 5 on each platform 100 times - manually added some thread.sleep into startProcess and verify that not failures still appear ------------- PR Comment: https://git.openjdk.org/jdk/pull/13683#issuecomment-1526061664 From cjplummer at openjdk.org Thu Apr 27 19:06:54 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 27 Apr 2023 19:06:54 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> References: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> Message-ID: <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> On Thu, 27 Apr 2023 10:58:14 GMT, Alan Bateman wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> install_async_exception: set interrupt status for platform threads only > > src/hotspot/share/prims/jvmti.xml line 11984: > >> 11982: >> 11983: Information about the frame is not available (e.g. for native frames), >> 11984: or the frame is not suitable for the requested operation. > > After re-reading the spec changes, I'm wondering if we can improve on "or the frame is not suitable for the requested operation". StopThread doesn't have a frame parameter. ForceEarlyReturn doesn't have a frame parameter either as it's implicit (the current frame). I wonder if wording something like this might be better: > "or a function on a thread cannot be performed at the thread's current frame". The wording starts off with "Information about the frame...", and you haven't suggested to change that to "the current frame". We should be consistent. Can't we just change both "the frame" references to "the current frame", and leave the rest the same as what Serguei has in place here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1179561498 From alanb at openjdk.org Thu Apr 27 19:07:52 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 27 Apr 2023 19:07:52 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> References: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> Message-ID: On Thu, 27 Apr 2023 18:49:40 GMT, Chris Plummer wrote: >> src/hotspot/share/prims/jvmti.xml line 11984: >> >>> 11982: >>> 11983: Information about the frame is not available (e.g. for native frames), >>> 11984: or the frame is not suitable for the requested operation. >> >> After re-reading the spec changes, I'm wondering if we can improve on "or the frame is not suitable for the requested operation". StopThread doesn't have a frame parameter. ForceEarlyReturn doesn't have a frame parameter either as it's implicit (the current frame). I wonder if wording something like this might be better: >> "or a function on a thread cannot be performed at the thread's current frame". > > The wording starts off with "Information about the frame...", and you haven't suggested to change that to "the current frame". We should be consistent. Can't we just change both "the frame" references to "the current frame", and leave the rest the same as what Serguei has in place here? I think the first part is okay because it's for functions that are about frames. The NotifyFramePop specifies the depth so it may not be the current frame. The second usage is the functions on a thread where we might do better than "not suitable". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1179571945 From cjplummer at openjdk.org Thu Apr 27 19:17:03 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 27 Apr 2023 19:17:03 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: References: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> Message-ID: On Thu, 27 Apr 2023 18:59:34 GMT, Alan Bateman wrote: >> The wording starts off with "Information about the frame...", and you haven't suggested to change that to "the current frame". We should be consistent. Can't we just change both "the frame" references to "the current frame", and leave the rest the same as what Serguei has in place here? > > I think the first part is okay because it's for functions that are about frames. The NotifyFramePop specifies the depth so it may not be the current frame. The second usage is the functions on a thread where we might do better than "not suitable". Ok. How about "the function cannot be performed on the thread's current frame". We already have a couple references to "the function" in the error codes section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1179584919 From cslucas at openjdk.org Thu Apr 27 20:52:23 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 27 Apr 2023 20:52:23 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Sat, 22 Apr 2023 01:42:41 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - Merge master >> - Add support for rematerializing scalar replaced objects participating in allocation merges > > src/hotspot/share/code/debugInfo.hpp line 199: > >> 197: // ObjectValue describing an object that was scalar replaced. >> 198: >> 199: class ObjectMergeValue: public ObjectValue { > > I find the decision to subclass`ObjectValue` confusing and error prone: now `is_object()` returns true for `ObjectMergeValue`, but you have to apply the selector first to turn it into `ObjectValue`. And now the order of checks matter, so you always have to perform `is_object_merge()` first and then follow it with `is_object()` guard. > > You have 3 flavors of `ObjectValue` now: > * good old `ObjectValue`; > * `ObjectMergeValue` > * merge candidates (`ObjectMergeCandidateValue`?) > > Does it make sense to introduce 3 different subclasses under `ObjectValue` to clearly distinguish the scenarios? Hi @iwanowww . I finished implementing a version of this like the illustration below (I didn't add a Candidate class). ScopeValue ObjectValue ObjectAllocationValue AutoBoxObjectValue ObjectMergeValue Here are some observations: - I don't think ObjectMergeValue should be under ObjectValue. The two classes only have two fields in common (_id and _visited). I think it should be a subclass of ScopeValue. - ObjectCandidateValue would need to go under ObjectAllocationValue because it essentially _is_ an ObjectAllocationValue in most aspects. - I didn't add a ObjectCandidateValue class because that class would need to go under ObjectAllocationValue and we would still need to do an "is_object_candidate" before all "is_object_allocation" and we would end up in much the situation that we want to avoid - needing to do is_object_merge before is_object. - It seems the best place to flag an object as candidate is really in ObjectAllocationValue. What do you think? As I said, I already have the code, if you want I can push it and you take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1179649780 From dcubed at openjdk.org Thu Apr 27 21:50:17 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 27 Apr 2023 21:50:17 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v64] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 07:52:23 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Suggestios by @dcubed-ojdk src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 213: > 211: // refer to Threads::owning_thread_from_monitor_owner > 212: public JavaThread owningThreadFromMonitor(Address o) { > 213: assert(VM.getVM().getCommandLineFlag("LockingMode").getInt() != 2 /* LM_LIGHTWEIGHT */); nit: indent is too short by 2 spaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1179702987 From vlivanov at openjdk.org Thu Apr 27 23:50:55 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Apr 2023 23:50:55 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 27 Apr 2023 20:33:38 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/code/debugInfo.hpp line 199: >> >>> 197: // ObjectValue describing an object that was scalar replaced. >>> 198: >>> 199: class ObjectMergeValue: public ObjectValue { >> >> I find the decision to subclass`ObjectValue` confusing and error prone: now `is_object()` returns true for `ObjectMergeValue`, but you have to apply the selector first to turn it into `ObjectValue`. And now the order of checks matter, so you always have to perform `is_object_merge()` first and then follow it with `is_object()` guard. >> >> You have 3 flavors of `ObjectValue` now: >> * good old `ObjectValue`; >> * `ObjectMergeValue` >> * merge candidates (`ObjectMergeCandidateValue`?) >> >> Does it make sense to introduce 3 different subclasses under `ObjectValue` to clearly distinguish the scenarios? > > Hi @iwanowww . I finished implementing a version of this like the illustration below (I didn't add a Candidate class). > > > ScopeValue > ObjectValue > ObjectAllocationValue > AutoBoxObjectValue > ObjectMergeValue > > > Here are some observations: > > - I don't think ObjectMergeValue should be under ObjectValue. The two classes only have two fields in common (_id and _visited). I think it should be a subclass of ScopeValue. > - ObjectCandidateValue would need to go under ObjectAllocationValue because it essentially _is_ an ObjectAllocationValue in most aspects. > - I didn't add a ObjectCandidateValue class because that class would need to go under ObjectAllocationValue and we would still need to do an "is_object_candidate" before all "is_object_allocation" and we would end up in much the situation that we want to avoid - needing to do is_object_merge before is_object. > - It seems the best place to flag an object as candidate is really in ObjectAllocationValue. > > What do you think? As I said, I already have the code, if you want I can push it and you take a look. Can `ObjectCandidateValue` be a wrapper around a `ObjectAllocationValue`? It does make sense to separate `ObjectMergeValue` and `ObjectValue`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1179798496 From vlivanov at openjdk.org Thu Apr 27 23:50:56 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Apr 2023 23:50:56 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 27 Apr 2023 23:35:02 GMT, Vladimir Ivanov wrote: >> Hi @iwanowww . I finished implementing a version of this like the illustration below (I didn't add a Candidate class). >> >> >> ScopeValue >> ObjectValue >> ObjectAllocationValue >> AutoBoxObjectValue >> ObjectMergeValue >> >> >> Here are some observations: >> >> - I don't think ObjectMergeValue should be under ObjectValue. The two classes only have two fields in common (_id and _visited). I think it should be a subclass of ScopeValue. >> - ObjectCandidateValue would need to go under ObjectAllocationValue because it essentially _is_ an ObjectAllocationValue in most aspects. >> - I didn't add a ObjectCandidateValue class because that class would need to go under ObjectAllocationValue and we would still need to do an "is_object_candidate" before all "is_object_allocation" and we would end up in much the situation that we want to avoid - needing to do is_object_merge before is_object. >> - It seems the best place to flag an object as candidate is really in ObjectAllocationValue. >> >> What do you think? As I said, I already have the code, if you want I can push it and you take a look. > > Can `ObjectCandidateValue` be a wrapper around a `ObjectAllocationValue`? > > It does make sense to separate `ObjectMergeValue` and `ObjectValue`. I need to to study the code in more details. Seems like I'm missing something important here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1179798907 From sspitsyn at openjdk.org Fri Apr 28 00:56:23 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 28 Apr 2023 00:56:23 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: References: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> Message-ID: On Thu, 27 Apr 2023 19:14:56 GMT, Chris Plummer wrote: >> I think the first part is okay because it's for functions that are about frames. The NotifyFramePop specifies the depth so it may not be the current frame. The second usage is the functions on a thread where we might do better than "not suitable". > > Ok. How about "the function cannot be performed on the thread's current frame". We already have a couple references to "the function" in the error codes section. We have two suggestions: > - "or a function on a thread cannot be performed at the thread's current frame". > - "the function cannot be performed on the thread's current frame." So, we need to pick one. The second one looks simpler to me but I'm not completely sure that it reflects the full meaning correctly. I wonder about a mix of the two suggestions above: > "the function cannot be performed at the thread's current frame." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1179833278 From sspitsyn at openjdk.org Fri Apr 28 00:56:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 28 Apr 2023 00:56:24 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: References: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> Message-ID: On Fri, 28 Apr 2023 00:46:23 GMT, Serguei Spitsyn wrote: >> Ok. How about "the function cannot be performed on the thread's current frame". We already have a couple references to "the function" in the error codes section. > > We have two suggestions: >> - "or a function on a thread cannot be performed at the thread's current frame". >> - "the function cannot be performed on the thread's current frame." > > So, we need to pick one. The second one looks simpler to me but > I'm not completely sure that it reflects the full meaning correctly. > I wonder about a mix of the two suggestions above: > >> "the function cannot be performed at the thread's current frame." We need to account for the `SetLocalXXX` functions with the `depth` parameter which also return `OPAQUE_FRAME` error code for virtual frames. My concern is if the "current frame" part is fully correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1179834952 From kbarrett at openjdk.org Fri Apr 28 03:05:52 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 28 Apr 2023 03:05:52 GMT Subject: RFR: 8305566: Change StringDedup thread to derive from JavaThread [v2] In-Reply-To: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> References: <-k-_JovF26G4lOTq2AvCVxvDgwnqpD4-GSbSYCbDcn4=.e82221d2-9fd8-4aed-9a11-6f5ccc09c669@github.com> Message-ID: On Mon, 24 Apr 2023 09:25:01 GMT, Kim Barrett wrote: >> Please review this change to the string deduplication thread to make it a kind >> of JavaThread rather than a ConcurrentGCThread. There are several pieces to >> this change: >> >> (1) New class StringDedupThread (derived from JavaThread), separate from >> StringDedup::Processor (which is now just a CHeapObj instead of deriving from >> ConcurrentGCThread). The thread no longer needs to or supports being stopped, >> like other similar threads. It also needs to be started later, once Java >> threads are supported. Also don't need an explicit visitor, since it will be >> in the normal Java threads list. This separation made the changeover a little >> cleaner to develop, and made the servicability support a little cleaner too. >> >> (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, >> instead of using the SuspendibleThreadSet facility. >> >> (3) Because we're using ThreadBlockInVM, which has a different usage style >> from STS, the tracking of time spent by the processor blocked for safepoints >> doesn't really work. It's not very important anyway, since normal thread >> descheduling can also affect the normal processing times being gathered and >> reported. So we just drop the so-called "blocked" time and associated >> infrastructure, simplifying Stat tracking a bit. Also renamed the >> "concurrent" stat to be "active", since it's all in a JavaThread now. >> >> (4) To avoid #include problems, moved the definition of >> JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, >> where one of the functions it calls also is defined. >> >> (5) Added servicability support for the new thread. >> >> Testing: >> mach5 tier1-3 with -XX:+UseStringDeduplication. >> The test runtime/cds/DeterministicDump.java fails intermittently with that >> option, which is not surprising - see JDK-8306712. >> >> I was never able to reproduce the failure; it's likely quite timing sensitive. >> The fix of changing the type is based on StefanK's comment that ZResurrection >> doesn't expect a non-Java thread to perform load-barriers. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > fix include order Thanks all for reviews. I've changed the bug and PR summaries as requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13607#issuecomment-1526905175 From kbarrett at openjdk.org Fri Apr 28 03:26:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 28 Apr 2023 03:26:53 GMT Subject: RFR: 8305566: Change StringDedup thread to derive from JavaThread [v3] In-Reply-To: References: Message-ID: > Please review this change to the string deduplication thread to make it a kind > of JavaThread rather than a ConcurrentGCThread. There are several pieces to > this change: > > (1) New class StringDedupThread (derived from JavaThread), separate from > StringDedup::Processor (which is now just a CHeapObj instead of deriving from > ConcurrentGCThread). The thread no longer needs to or supports being stopped, > like other similar threads. It also needs to be started later, once Java > threads are supported. Also don't need an explicit visitor, since it will be > in the normal Java threads list. This separation made the changeover a little > cleaner to develop, and made the servicability support a little cleaner too. > > (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, > instead of using the SuspendibleThreadSet facility. > > (3) Because we're using ThreadBlockInVM, which has a different usage style > from STS, the tracking of time spent by the processor blocked for safepoints > doesn't really work. It's not very important anyway, since normal thread > descheduling can also affect the normal processing times being gathered and > reported. So we just drop the so-called "blocked" time and associated > infrastructure, simplifying Stat tracking a bit. Also renamed the > "concurrent" stat to be "active", since it's all in a JavaThread now. > > (4) To avoid #include problems, moved the definition of > JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, > where one of the functions it calls also is defined. > > (5) Added servicability support for the new thread. > > Testing: > mach5 tier1-3 with -XX:+UseStringDeduplication. > The test runtime/cds/DeterministicDump.java fails intermittently with that > option, which is not surprising - see JDK-8306712. > > I was never able to reproduce the failure; it's likely quite timing sensitive. > The fix of changing the type is based on StefanK's comment that ZResurrection > doesn't expect a non-Java thread to perform load-barriers. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into jt-strdedup - fix include order - fix stray tab - move is_active_Java_thread - copyrights - servicabilty support - use JavaThread - separate thread class - simplify init - do not pass around STS joiner - ... and 2 more: https://git.openjdk.org/jdk/compare/8957f67e...da07a420 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13607/files - new: https://git.openjdk.org/jdk/pull/13607/files/f17cc6be..da07a420 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13607&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13607&range=01-02 Stats: 46789 lines in 794 files changed: 30868 ins; 11396 del; 4525 mod Patch: https://git.openjdk.org/jdk/pull/13607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13607/head:pull/13607 PR: https://git.openjdk.org/jdk/pull/13607 From kbarrett at openjdk.org Fri Apr 28 03:35:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 28 Apr 2023 03:35:53 GMT Subject: Integrated: 8305566: Change StringDedup thread to derive from JavaThread In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 08:24:53 GMT, Kim Barrett wrote: > Please review this change to the string deduplication thread to make it a kind > of JavaThread rather than a ConcurrentGCThread. There are several pieces to > this change: > > (1) New class StringDedupThread (derived from JavaThread), separate from > StringDedup::Processor (which is now just a CHeapObj instead of deriving from > ConcurrentGCThread). The thread no longer needs to or supports being stopped, > like other similar threads. It also needs to be started later, once Java > threads are supported. Also don't need an explicit visitor, since it will be > in the normal Java threads list. This separation made the changeover a little > cleaner to develop, and made the servicability support a little cleaner too. > > (2) The Processor now uses the ThreadBlockInVM idiom to be safepoint polite, > instead of using the SuspendibleThreadSet facility. > > (3) Because we're using ThreadBlockInVM, which has a different usage style > from STS, the tracking of time spent by the processor blocked for safepoints > doesn't really work. It's not very important anyway, since normal thread > descheduling can also affect the normal processing times being gathered and > reported. So we just drop the so-called "blocked" time and associated > infrastructure, simplifying Stat tracking a bit. Also renamed the > "concurrent" stat to be "active", since it's all in a JavaThread now. > > (4) To avoid #include problems, moved the definition of > JavaThread::is_active_Java_thread from the .hpp file to the .inline.hpp file, > where one of the functions it calls also is defined. > > (5) Added servicability support for the new thread. > > Testing: > mach5 tier1-3 with -XX:+UseStringDeduplication. > The test runtime/cds/DeterministicDump.java fails intermittently with that > option, which is not surprising - see JDK-8306712. > > I was never able to reproduce the failure; it's likely quite timing sensitive. > The fix of changing the type is based on StefanK's comment that ZResurrection > doesn't expect a non-Java thread to perform load-barriers. This pull request has now been integrated. Changeset: d3abfec8 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/d3abfec8b7ce901150952356f9f1109d09a8cb2a Stats: 440 lines in 18 files changed: 193 ins; 146 del; 101 mod 8305566: Change StringDedup thread to derive from JavaThread Reviewed-by: stefank, cjplummer, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/13607 From fyang at openjdk.org Fri Apr 28 04:11:24 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 28 Apr 2023 04:11:24 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v3] In-Reply-To: References: Message-ID: <_eZoblPOvxkwGY38lHkL29QyncwAGIOkUNBu-GzDYlY=.f05d9d52-4f26-4064-be79-34276ce6ab38@github.com> On Wed, 26 Apr 2023 10:40:28 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > spaces src/hotspot/cpu/riscv/assembler_riscv.hpp line 1955: > 1953: target &= ~mask; > 1954: target |= val; > 1955: sd_c_instr(a, target); Compressed instructions are supposed to be located at some addresses at least 2-bytes aligned. So I am thinking that there shouldn't be any unaligned access happening here. src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 349: > 347: #define assert_addr_alignment_cond(cond, addr) assert(!(cond) || is_aligned(addr, 4), "bad addr alignment: " INTPTR_FORMAT, p2i(addr)) > 348: #define assert_addr_alignment(addr) assert_addr_alignment_cond(true, addr) > 349: #define assert_offset_alignment_cond(cond, offset) assert(!(cond) || ((offset) % 4) == 0, "bad offset alignment: %d", offset) Looks like these macros are not used anywhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1179903107 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1179903332 From dholmes at openjdk.org Fri Apr 28 06:00:24 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 28 Apr 2023 06:00:24 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 14:21:23 GMT, Coleen Phillimore wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove bool argument from ConstMethodFlags.set function. Thanks for the updates. I understand about the fanout from making `is` naming fully consistent. src/hotspot/share/oops/method.hpp line 875: > 873: bool is_not_c1_osr_compilable() const { return is_not_c1_compilable(); } > 874: void set_is_not_c1_osr_compilable() { set_is_not_c1_compilable(); } > 875: void clear_is_not_c1_osr_compilable() { clear_is_not_c1_compilable(); } Nit: don't need extra spaces after `{` ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13654#pullrequestreview-1405267857 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1179956725 From rrich at openjdk.org Fri Apr 28 06:25:57 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 28 Apr 2023 06:25:57 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: <1YlI_TkZBDspcXnRnIuRRDvguV3F1BCT8lQvJN-SxRM=.7b3c7953-3445-40bb-80cd-56af29a9cd97@github.com> On Wed, 26 Apr 2023 09:42:52 GMT, Richard Reingruber wrote: > With this pr I would like to wrap the body of the macro `offset_of` as defined for gcc in parentheses to fix the described issues with Eclipse CDT. > As an alternative the [`(int)` cast in `byte_offset_of`](https://github.com/openjdk/jdk/blob/44d9f55d0b3c469988be6f1c47f0cfbc433c4490/src/hotspot/share/utilities/sizes.hpp#L59) could be removed. I preferred adding the parentheses to the gcc version of `offset_of` because the impact is smaller. > > Of course I'd rather have a local solution for the issue but I couldn't find one that didn't require changes in the source code. > > Testing: GHA Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13668#issuecomment-1527035901 From rrich at openjdk.org Fri Apr 28 06:25:59 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 28 Apr 2023 06:25:59 GMT Subject: Integrated: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: <6Fbu4NCtNoUbybo7-N6LuyjhiqJoYLVcKM-oW9C6FQg=.ad50729b-889a-4276-b134-80270210a1cf@github.com> On Wed, 26 Apr 2023 09:42:52 GMT, Richard Reingruber wrote: > With this pr I would like to wrap the body of the macro `offset_of` as defined for gcc in parentheses to fix the described issues with Eclipse CDT. > As an alternative the [`(int)` cast in `byte_offset_of`](https://github.com/openjdk/jdk/blob/44d9f55d0b3c469988be6f1c47f0cfbc433c4490/src/hotspot/share/utilities/sizes.hpp#L59) could be removed. I preferred adding the parentheses to the gcc version of `offset_of` because the impact is smaller. > > Of course I'd rather have a local solution for the issue but I couldn't find one that didn't require changes in the source code. > > Testing: GHA This pull request has now been integrated. Changeset: eb3af8ab Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/eb3af8abe9743669887445f8fc5ff647187f983a Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8306901: Macro offset_of confuses Eclipse CDT Reviewed-by: stuefe, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/13668 From dnsimon at openjdk.org Fri Apr 28 07:30:22 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 28 Apr 2023 07:30:22 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 14:21:23 GMT, Coleen Phillimore wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove bool argument from ConstMethodFlags.set function. Marked as reviewed by dnsimon (Committer). Thankfully all these changes only impact values read by JVMCI Java code and none in [Graal Java code](https://github.com/oracle/graal/blob/114067fc41d97e6c07f6de9bd745196d6f967ae4/compiler/src/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java#L47). Looks good to me. ------------- PR Review: https://git.openjdk.org/jdk/pull/13654#pullrequestreview-1405368377 PR Comment: https://git.openjdk.org/jdk/pull/13654#issuecomment-1527109990 From rkennke at openjdk.org Fri Apr 28 07:43:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Apr 2023 07:43:24 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v65] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 162 commits: - Merge branch 'master' into JDK-8291555-v2 - Fix formatting - Suggestios by @dcubed-ojdk - Suggested changes by @merykitty - Remove unnecessary comments - Simple build fix for extra arches - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 - A few more LM_ prefixes in 32bit code - Replace UseHeavyMonitor with LockingMode == LM_MONITOR - Prefix LockingMode constants with LM_* - ... and 152 more: https://git.openjdk.org/jdk/compare/3d9d84b7...d90810f1 ------------- Changes: https://git.openjdk.org/jdk/pull/10907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=64 Stats: 2493 lines in 68 files changed: 1675 ins; 104 del; 714 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From duke at openjdk.org Fri Apr 28 08:05:53 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Fri, 28 Apr 2023 08:05:53 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v2] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 14:26:02 GMT, Martin Doerr wrote: >> Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed jlong conversion order > > Would be helpful if you could enable the Pre-submit test (GitHub actions). > @TheRealMDoerr , @w-kudla I have filed the CSR request for this change and reviewed it. Huge thanks, @dholmes-ora. I'm about to start working on the test case. Should update this PR soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13373#issuecomment-1527144072 From duke at openjdk.org Fri Apr 28 08:07:53 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Fri, 28 Apr 2023 08:07:53 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v2] In-Reply-To: <1z2ZOeorI_1mGoZ5NvNqutTolY7XrpA5LyZvxzkyOhU=.11e67f25-138b-462f-b966-f296bdf86c2f@github.com> References: <1z2ZOeorI_1mGoZ5NvNqutTolY7XrpA5LyZvxzkyOhU=.11e67f25-138b-462f-b966-f296bdf86c2f@github.com> Message-ID: On Thu, 27 Apr 2023 05:53:57 GMT, David Holmes wrote: >> Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed jlong conversion order > > src/hotspot/share/utilities/globalDefinitions.hpp line 170: > >> 168: >> 169: // Format jdouble with defined precision >> 170: #define JDOUBLE_FORMAT_P(precision) "%." #precision "f" > > This is not necessary. We only define macros when there are platform differences with format specifiers. Just use `%.6f` directly in the code. Will do, thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13373#discussion_r1180056176 From stuefe at openjdk.org Fri Apr 28 08:09:54 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 28 Apr 2023 08:09:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v65] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 07:43:24 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 162 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Fix formatting > - Suggestios by @dcubed-ojdk > - Suggested changes by @merykitty > - Remove unnecessary comments > - Simple build fix for extra arches > - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 > - A few more LM_ prefixes in 32bit code > - Replace UseHeavyMonitor with LockingMode == LM_MONITOR > - Prefix LockingMode constants with LM_* > - ... and 152 more: https://git.openjdk.org/jdk/compare/3d9d84b7...d90810f1 Thanks for the merge. The only thing hindering ppcle from building is this: diff --git a/src/hotspot/share/runtime/arguments.cpp b/src/hotspot/share/runtime/arguments.cpp index c758f301cc9..c46bb6c3b5d 100644 --- a/src/hotspot/share/runtime/arguments.cpp +++ b/src/hotspot/share/runtime/arguments.cpp @@ -1984,7 +1984,7 @@ bool Arguments::check_vm_args_consistency() { #if !defined(X86) && !defined(AARCH64) && !defined(RISCV64) && !defined(ARM) if (LockingMode == LM_LIGHTWEIGHT) { - FLAG_SET_CMDLINE(LockingMode, LEGACY); + FLAG_SET_CMDLINE(LockingMode, LM_LEGACY); warning("New lightweight locking not supported on this platform"); } #endif Cannot build s390 because build is broken upstream (https://bugs.openjdk.org/browse/JDK-8307093) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1527147139 From ayang at openjdk.org Fri Apr 28 08:59:54 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 28 Apr 2023 08:59:54 GMT Subject: RFR: 8307005: Make CardTableBarrierSet::initialize non-virtual Message-ID: <-H7u-j35jnRfjrN90FQm5tauUZL3zVLnb_H2iVewmBw=.5227d5b5-3756-4908-9cf5-b7d1fd755955@github.com> Trivial removing `virtual` specifier. ------------- Commit messages: - trivial Changes: https://git.openjdk.org/jdk/pull/13713/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13713&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307005 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13713.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13713/head:pull/13713 PR: https://git.openjdk.org/jdk/pull/13713 From stuefe at openjdk.org Fri Apr 28 09:01:26 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 28 Apr 2023 09:01:26 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: <1M7ql10BxRIBDp38CNNb_D0i6CE4O-97lGFO7iDRaFI=.570de882-cda2-4582-a4eb-c49cbc38aff6@github.com> References: <1M7ql10BxRIBDp38CNNb_D0i6CE4O-97lGFO7iDRaFI=.570de882-cda2-4582-a4eb-c49cbc38aff6@github.com> Message-ID: On Wed, 26 Apr 2023 18:35:45 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/arm/sharedRuntime_arm.cpp line 649: >> >>> 647: >>> 648: __ flush(); >>> 649: return AdapterHandlerLibrary::new_entry(fingerprint, i2c_entry, c2i_entry, c2i_unverified_entry); >> >> This change seems out of place... what's the story here? > > This is a local revert of *8303154: Investigate and improve instruction cache flushing during compilation* - the missing flush caused random crashes, but I did not have time to investigate. I reverted the flush, crashes were gone. > > If needed I may revisit this when there is time. I'll remove it. Tests have to wait on arm for https://bugs.openjdk.org/browse/JDK-8305387 though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1180108118 From stuefe at openjdk.org Fri Apr 28 09:24:05 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 28 Apr 2023 09:24:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v60] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 18:58:13 GMT, Roman Kennke wrote: >>> @rkennke - I'm planning to do another crawl thru review next week. >> >> Thanks! That is greatly appeciated! > >> @rkennke - finished my second crawl thru review of 60/68 files changed. I only skipped the RISC-V files since I know nada about that platform... >> >> My Mach5 testing of v61 is running Tier7 and I hope to start Tier8 later tonight. So far all testing looks good, but I'll include the usual summary comment in the bug report... > > Thanks so much for reviewing this large PR (so many times)! I believe I have incorporated all your suggestions (or left a comment/question when it wasn't clear). > > Cheers, > Roman @rkennke Last ARM32 fixes: https://gist.github.com/tstuefe/8a0fd30618f1d0e085b5ca12d7c156cd I removed the superfluous flush from sharedRuntime. For a test, I applied https://github.com/openjdk/jdk/pull/13596 patch and built and tested arm (starting fastlockbench with interpreted, c1, c2), all seems to be well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1527233512 From vkempik at openjdk.org Fri Apr 28 09:31:52 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 28 Apr 2023 09:31:52 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v3] In-Reply-To: <_eZoblPOvxkwGY38lHkL29QyncwAGIOkUNBu-GzDYlY=.f05d9d52-4f26-4064-be79-34276ce6ab38@github.com> References: <_eZoblPOvxkwGY38lHkL29QyncwAGIOkUNBu-GzDYlY=.f05d9d52-4f26-4064-be79-34276ce6ab38@github.com> Message-ID: On Fri, 28 Apr 2023 03:53:29 GMT, Fei Yang wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> spaces > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 1955: > >> 1953: target &= ~mask; >> 1954: target |= val; >> 1955: sd_c_instr(a, target); > > Compressed instructions are supposed to be located at some addresses at least 2-bytes aligned. So I am thinking that there shouldn't be any unaligned access happening here. on one hand - you are right, on another one - this way it looks more unified with 4-byte opcodes loads/stores ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1180156370 From vkempik at openjdk.org Fri Apr 28 09:58:53 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 28 Apr 2023 09:58:53 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v4] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Remove unused macros ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/8323870d..8b9aa84c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From avoitylov at openjdk.org Fri Apr 28 09:58:00 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Fri, 28 Apr 2023 09:58:00 GMT Subject: Integrated: JDK-8305387: JDK-8301995 breaks arm 32-bit In-Reply-To: References: Message-ID: On Sat, 22 Apr 2023 06:54:59 GMT, Aleksei Voitylov wrote: > Provides missing implementation for arm32. > > Testing: hotspot/jtreg. This pull request has now been integrated. Changeset: 60a29a66 Author: Aleksei Voitylov Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/60a29a668c07cf7c15728063b19bb235c5fd2052 Stats: 106 lines in 4 files changed: 93 ins; 3 del; 10 mod 8305387: JDK-8301995 breaks arm 32-bit Reviewed-by: shade, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/13596 From stuefe at openjdk.org Fri Apr 28 10:45:25 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 28 Apr 2023 10:45:25 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v65] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 07:43:24 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 162 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Fix formatting > - Suggestios by @dcubed-ojdk > - Suggested changes by @merykitty > - Remove unnecessary comments > - Simple build fix for extra arches > - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 > - A few more LM_ prefixes in 32bit code > - Replace UseHeavyMonitor with LockingMode == LM_MONITOR > - Prefix LockingMode constants with LM_* > - ... and 152 more: https://git.openjdk.org/jdk/compare/3d9d84b7...d90810f1 I gave it a final inspection (as of https://github.com/openjdk/jdk/pull/10907/commits/1323e9584ee2303da0553e65c3968b5e81394b06). If you take over my small arm32 and ppcle changes, the patch looks good from my side. We should increase the number of reviewers though, and at least Dan should ok it as well. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1405683129 From rkennke at openjdk.org Fri Apr 28 11:32:54 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Apr 2023 11:32:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v66] In-Reply-To: References: Message-ID: <-Kq6LaQmYZC8PVnmA4IH6QflBHwDB8__ovkqWOGFjeE=.451a7a23-578d-4b7f-b55d-74759c2cc446@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix arm and ppcle builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/d90810f1..50f1369f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=65 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=64-65 Stats: 4 lines in 3 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From ayang at openjdk.org Fri Apr 28 12:25:26 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 28 Apr 2023 12:25:26 GMT Subject: RFR: 8307100: Remove ReferentBasedDiscovery reference discovery policy Message-ID: Mostly consisting of mechanic refactoring after replacing `RefDiscoveryPolicy == ...` with `true` or `false`. Test: tier1-6 ------------- Commit messages: - remove-referent-policy Changes: https://git.openjdk.org/jdk/pull/13715/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13715&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307100 Stats: 87 lines in 6 files changed: 2 ins; 68 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/13715.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13715/head:pull/13715 PR: https://git.openjdk.org/jdk/pull/13715 From sjohanss at openjdk.org Fri Apr 28 13:10:57 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 28 Apr 2023 13:10:57 GMT Subject: RFR: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared Message-ID: Hi all, Please review this change to avoid CleanClassLoaderDataMetaspaces safepoint when there is nothing that can be cleaned up. **Summary** When transforming/redefining classes a previous version list is linked together in the InstanceKlass. The original class is added to this list if it is still used or shared. The difference between shared and used is not currently noted. This leads to a problem when doing concurrent class unloading, because during that we postpone some potential work to a safepoint (since we are not in one). This is the CleanClassLoaderDataMetaspaces and it is triggered by the ServiceThread if there is work to be done, for example if InstanceKlass::_has_previous_versions is true. Since we currently does not differentiate between shared and "in use" we always set _has_previous_versions if anything is on this list. This together with the fact that shared previous versions should never be cleaned out leads to this safepoint being triggered after every concurrent class unloading even though there is nothing that can be cleaned out. This can be avoided by making sure the _previous_versions list is only cleaned when there are non-shared classes on it. This change renames `_has_previous_versions` to `_clean_previous_versions` and only updates it if we have non-shared classes on the list. **Testing** * A lot of manual testing verifying that we do get the safepoint when we should. * Added new test to verify expected behavior by parsing the logs. The test uses JFR to trigger redefinition of some shared classes (when -Xshare:on). * Mach5 run of new test and tier 1-3 ------------- Commit messages: - Add test wih JFR - 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared Changes: https://git.openjdk.org/jdk/pull/13716/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13716&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306929 Stats: 139 lines in 6 files changed: 116 ins; 3 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/13716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13716/head:pull/13716 PR: https://git.openjdk.org/jdk/pull/13716 From mdoerr at openjdk.org Fri Apr 28 13:33:24 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Apr 2023 13:33:24 GMT Subject: RFR: 8307104: [AIX] VM crashes with UseRTMLocking on Power10 Message-ID: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> We need to prevent usage of transactional memory (UseRTMLocking) on Power10 which doesn't support it. The VM crashes with SIGILL on AIX when trying to use it. I'm also changing the AIX specific check for the case in which somebody uses Power10 with -XX:PowerArchitecturePPC64=8 (or 9). The Linux specific code is fine as it is. This change is small and should get considered for backports. We may remove the RTM code completely for future JDKs. ------------- Commit messages: - 8307104: [AIX] VM crashes with UseRTMLocking on Power10 Changes: https://git.openjdk.org/jdk/pull/13717/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13717&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307104 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13717/head:pull/13717 PR: https://git.openjdk.org/jdk/pull/13717 From jwaters at openjdk.org Fri Apr 28 13:33:53 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 28 Apr 2023 13:33:53 GMT Subject: RFR: 8306901: Macro offset_of confuses Eclipse CDT In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 10:00:57 GMT, Thomas Stuefe wrote: >>> > > Also I wonder whether the ATTRIBUTE_ALIGNED(16) would be more correctly `alignas(klass)` but that is pre-existing. >>> > >>> > >>> > Probably :) At least it would be cleaner in my eyes than the hard coded 16. With this pr I'd like to address just the CDT issue though. >>> >>> I believe the trick may not work if the type were artificially aligned to something larger than16, e.g. struct XX alignas(64). I guess that depends on what the compiler does when casting a pointer with a smaller alignment to something with a larger alignemnt. >> >> Maybe. I'm not familiar enough with `alignas` right now. Also I would have to 'meditate' a bit about [the alignas-part in the hotspot-style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#alignas). >> Right now I wouldn't expect extended alignment of a type to affect member offsets. But that might be naive. > >> > > > Also I wonder whether the ATTRIBUTE_ALIGNED(16) would be more correctly `alignas(klass)` but that is pre-existing. >> > > >> > > >> > > Probably :) At least it would be cleaner in my eyes than the hard coded 16. With this pr I'd like to address just the CDT issue though. >> > >> > >> > I believe the trick may not work if the type were artificially aligned to something larger than16, e.g. struct XX alignas(64). I guess that depends on what the compiler does when casting a pointer with a smaller alignment to something with a larger alignemnt. >> >> Maybe. I'm not familiar enough with `alignas` right now. Also I would have to 'meditate' a bit about [the alignas-part in the hotspot-style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#alignas). Right now I wouldn't expect extended alignment of a type to affect member offsets. But that might be naive. > > Checked with GCC and clang, it seems to work. But I think its still UB (otherwise we could just omit the ALIGN specifier completely). @tstuefe Currently there is a change planned to replace the macro in some place with alignas (see #11431) but I haven't had time to get round to finishing it just yet ------------- PR Comment: https://git.openjdk.org/jdk/pull/13668#issuecomment-1527547745 From mdoerr at openjdk.org Fri Apr 28 13:42:23 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Apr 2023 13:42:23 GMT Subject: RFR: 8307104: [AIX] VM crashes with UseRTMLocking on Power10 In-Reply-To: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> References: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> Message-ID: <7vH1iSccFdFhPWdGm1OJCCQGS4kQs4LVEkXVZSSV4ME=.fd2afba6-e5e6-4a4e-b573-8b96cf170e9c@github.com> On Fri, 28 Apr 2023 13:13:41 GMT, Martin Doerr wrote: > We need to prevent usage of transactional memory (UseRTMLocking) on Power10 which doesn't support it. The VM crashes with SIGILL on AIX when trying to use it. > > I'm also changing the AIX specific check for the case in which somebody uses Power10 with -XX:PowerArchitecturePPC64=8 (or 9). > The Linux specific code is fine as it is. > > This change is small and should get considered for backports. We may remove the RTM code completely for future JDKs. @backwaterred: Please take a look. You will probably want to backport this fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13717#issuecomment-1527569032 From duke at openjdk.org Fri Apr 28 13:46:57 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Fri, 28 Apr 2023 13:46:57 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v3] In-Reply-To: References: Message-ID: > As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. > This is immensely useful for investigating time-to-safepoint issues in low latency space. Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: Removed dedicated format declaration for jdouble, added test case for floating point type of -XX:SafepointTimeoutDelay ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13373/files - new: https://git.openjdk.org/jdk/pull/13373/files/a8fdc733..7c49b4d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=01-02 Stats: 16 lines in 3 files changed: 6 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13373/head:pull/13373 PR: https://git.openjdk.org/jdk/pull/13373 From clanger at openjdk.org Fri Apr 28 14:32:24 2023 From: clanger at openjdk.org (Christoph Langer) Date: Fri, 28 Apr 2023 14:32:24 GMT Subject: RFR: 8307104: [AIX] VM crashes with UseRTMLocking on Power10 In-Reply-To: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> References: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> Message-ID: <-7t4DIAUdlsjnqqfUtdZxef5ri5xS8bcMjC4UgjdhqI=.7bf8557e-d437-4729-ab82-de9f9b6dcdab@github.com> On Fri, 28 Apr 2023 13:13:41 GMT, Martin Doerr wrote: > We need to prevent usage of transactional memory (UseRTMLocking) on Power10 which doesn't support it. The VM crashes with SIGILL on AIX when trying to use it. > > I'm also changing the AIX specific check for the case in which somebody uses Power10 with -XX:PowerArchitecturePPC64=8 (or 9). > The Linux specific code is fine as it is. > > This change is small and should get considered for backports. We may remove the RTM code completely for future JDKs. LGTM ------------- Marked as reviewed by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13717#pullrequestreview-1406043852 From matsaave at openjdk.org Fri Apr 28 14:32:53 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 28 Apr 2023 14:32:53 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 14:21:23 GMT, Coleen Phillimore wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove bool argument from ConstMethodFlags.set function. Nice change! Just some small nits but it otherwise looks good. src/hotspot/share/oops/method.hpp line 606: > 604: > 605: bool compute_has_loops_flag(); > 606: bool set_has_loops() { set_has_loops_flag(); set_has_loops_flag_init(); return true; } Since this has multiple statements it should probably be on different lines src/hotspot/share/oops/method.hpp line 615: > 613: // has not been computed yet. > 614: bool guaranteed_monitor_matching() const { return monitor_matching(); } > 615: void set_guaranteed_monitor_matching() { set_monitor_matching(); } Is this method just obsolete now? If so it might be worth replacing the callers with `set_monitor_matching()` unless `set_monitor_matching()` is still meant to be private. ------------- Changes requested by matsaave (Committer). PR Review: https://git.openjdk.org/jdk/pull/13654#pullrequestreview-1406036462 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1180462644 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1180472698 From coleenp at openjdk.org Fri Apr 28 15:43:03 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 28 Apr 2023 15:43:03 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 14:18:11 GMT, Matias Saavedra Silva wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove bool argument from ConstMethodFlags.set function. > > src/hotspot/share/oops/method.hpp line 615: > >> 613: // has not been computed yet. >> 614: bool guaranteed_monitor_matching() const { return monitor_matching(); } >> 615: void set_guaranteed_monitor_matching() { set_monitor_matching(); } > > Is this method just obsolete now? If so it might be worth replacing the callers with `set_monitor_matching()` unless `set_monitor_matching()` is still meant to be private. The reason I left that was to anchor the comment. There is nowhere good to put that in the X macro. Also, didn't want to fix the callers. It's a good point about making monitor_matching() private, but also not really doable with the X macro. So that's why I left it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1180548707 From coleenp at openjdk.org Fri Apr 28 15:42:58 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 28 Apr 2023 15:42:58 GMT Subject: RFR: 8306851: Move Method access flags [v4] In-Reply-To: References: Message-ID: > This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. > > This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. > > Tested with tier1-6, and some manual verification of printing. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13654/files - new: https://git.openjdk.org/jdk/pull/13654/files/6687cc0e..9b05ab84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=02-03 Stats: 9 lines in 1 file changed: 5 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13654.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13654/head:pull/13654 PR: https://git.openjdk.org/jdk/pull/13654 From coleenp at openjdk.org Fri Apr 28 16:16:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 28 Apr 2023 16:16:23 GMT Subject: RFR: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 12:48:44 GMT, Stefan Johansson wrote: > Hi all, > > Please review this change to avoid CleanClassLoaderDataMetaspaces safepoint when there is nothing that can be cleaned up. > > **Summary** > When transforming/redefining classes a previous version list is linked together in the InstanceKlass. The original class is added to this list if it is still used or shared. The difference between shared and used is not currently noted. This leads to a problem when doing concurrent class unloading, because during that we postpone some potential work to a safepoint (since we are not in one). This is the CleanClassLoaderDataMetaspaces and it is triggered by the ServiceThread if there is work to be done, for example if InstanceKlass::_has_previous_versions is true. > > Since we currently does not differentiate between shared and "in use" we always set _has_previous_versions if anything is on this list. This together with the fact that shared previous versions should never be cleaned out leads to this safepoint being triggered after every concurrent class unloading even though there is nothing that can be cleaned out. > > This can be avoided by making sure the _previous_versions list is only cleaned when there are non-shared classes on it. This change renames `_has_previous_versions` to `_clean_previous_versions` and only updates it if we have non-shared classes on the list. > > **Testing** > * A lot of manual testing verifying that we do get the safepoint when we should. > * Added new test to verify expected behavior by parsing the logs. The test uses JFR to trigger redefinition of some shared classes (when -Xshare:on). > * Mach5 run of new test and tier 1-3 This looks good. Thanks for all the testing and adding the new test. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13716#pullrequestreview-1406222927 From mdoerr at openjdk.org Fri Apr 28 16:46:54 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Apr 2023 16:46:54 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v3] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 16:29:46 GMT, Wojciech Kudla wrote: > Looks like the checks for Windows are failing due to error while attempting to install Visual Studio... > I'm not sure how to progress this from here. This should already be fixed. Merging with master may resolve the problem if you want to have the tests green. Note that green tests are not a strict requirement. Integration can get approved while known problems exist in the Pre-submit tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13373#issuecomment-1527812923 From duke at openjdk.org Fri Apr 28 16:46:53 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Fri, 28 Apr 2023 16:46:53 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v3] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 13:46:57 GMT, Wojciech Kudla wrote: >> As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. >> This is immensely useful for investigating time-to-safepoint issues in low latency space. > > Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: > > Removed dedicated format declaration for jdouble, added test case for floating point type of -XX:SafepointTimeoutDelay Looks like the checks for Windows are failing due to error while attempting to install Visual Studio... I'm not sure how to progress this from here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13373#issuecomment-1527806031 From dcubed at openjdk.org Fri Apr 28 17:00:58 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 28 Apr 2023 17:00:58 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v66] In-Reply-To: <-Kq6LaQmYZC8PVnmA4IH6QflBHwDB8__ovkqWOGFjeE=.451a7a23-578d-4b7f-b55d-74759c2cc446@github.com> References: <-Kq6LaQmYZC8PVnmA4IH6QflBHwDB8__ovkqWOGFjeE=.451a7a23-578d-4b7f-b55d-74759c2cc446@github.com> Message-ID: On Fri, 28 Apr 2023 11:32:54 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix arm and ppcle builds This project is currently baselined on jdk-21+21-1701. However, that build-ID contains very noisy test failures in Tier[234] and probably higher. If you could rebase on: jiefu: [452cb8 - OpenJDK](https://orahub.oci.oraclecorp.com/jpg-mirrors/jdk-open/commit/452cb8432f4d45c3dacd4415bc9499ae73f7a17c) [8307103 ](http://bugs.openjdk.java.net/browse/JDK-8307103) Two TestMetaspaceAllocationMT tests fail after JDK-8306696 That would make my next Mach5 test cycle much, much happier... ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1527824532 From fparain at openjdk.org Fri Apr 28 18:37:30 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 28 Apr 2023 18:37:30 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 14:21:23 GMT, Coleen Phillimore wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove bool argument from ConstMethodFlags.set function. Thank you for all those cleanings! Looks good to me. src/hotspot/share/oops/constMethod.cpp line 438: > 436: } > 437: st->cr(); > 438: st->print(" - flags: "); _flags.print_on(st); st->cr(); Method prints its flags as an int and in decoded form, but ConstMethod only prints the decoded form. Any particular reason for this difference? ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13654#pullrequestreview-1406421552 PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1180705106 From coleenp at openjdk.org Fri Apr 28 18:51:24 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 28 Apr 2023 18:51:24 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 18:29:50 GMT, Frederic Parain wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove bool argument from ConstMethodFlags.set function. > > src/hotspot/share/oops/constMethod.cpp line 438: > >> 436: } >> 437: st->cr(); >> 438: st->print(" - flags: "); _flags.print_on(st); st->cr(); > > Method prints its flags as an int and in decoded form, but ConstMethod only prints the decoded form. Any particular reason for this difference? No reason for this difference. The only reason I added the int form for MethodFlags was because AccessFlags were printed also with the int form. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1180712196 From rkennke at openjdk.org Fri Apr 28 19:23:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Apr 2023 19:23:28 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v66] In-Reply-To: References: <-Kq6LaQmYZC8PVnmA4IH6QflBHwDB8__ovkqWOGFjeE=.451a7a23-578d-4b7f-b55d-74759c2cc446@github.com> Message-ID: On Fri, 28 Apr 2023 16:48:12 GMT, Daniel D. Daugherty wrote: > http://bugs.openjdk.java.net/browse/JDK-8307103 Should be based on JDK-8307103 now. Thanks for all your testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1527972396 From rkennke at openjdk.org Fri Apr 28 19:23:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Apr 2023 19:23:24 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v67] In-Reply-To: References: Message-ID: <3Iabuiks5W03nXCOPejWEQAZMz1GqlvaZUmuvs5Bczs=.b8433f00-9394-437f-a7e1-db407bbba983@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 164 commits: - Merge commit '452cb8432f4d45c3dacd4415bc9499ae73f7a17c' into JDK-8291555-v2 - Fix arm and ppcle builds - Merge branch 'master' into JDK-8291555-v2 - Fix formatting - Suggestios by @dcubed-ojdk - Suggested changes by @merykitty - Remove unnecessary comments - Simple build fix for extra arches - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 - A few more LM_ prefixes in 32bit code - ... and 154 more: https://git.openjdk.org/jdk/compare/452cb843...39b199b6 ------------- Changes: https://git.openjdk.org/jdk/pull/10907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=66 Stats: 2492 lines in 68 files changed: 1674 ins; 104 del; 714 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Apr 28 19:27:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Apr 2023 19:27:53 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable Message-ID: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. Testing: - [x] tier1 - [x] tier2 ------------- Commit messages: - 8305903: Deflate monitors of dead objects before they become unreachable Changes: https://git.openjdk.org/jdk/pull/13721/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13721&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305903 Stats: 24 lines in 4 files changed: 23 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13721.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13721/head:pull/13721 PR: https://git.openjdk.org/jdk/pull/13721 From matsaave at openjdk.org Fri Apr 28 19:37:53 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 28 Apr 2023 19:37:53 GMT Subject: RFR: 8306851: Move Method access flags [v4] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 15:42:58 GMT, Coleen Phillimore wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Updates Looks good, thanks! ------------- Marked as reviewed by matsaave (Committer). PR Review: https://git.openjdk.org/jdk/pull/13654#pullrequestreview-1406485565 From coleenp at openjdk.org Fri Apr 28 20:00:53 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 28 Apr 2023 20:00:53 GMT Subject: RFR: 8306851: Move Method access flags [v3] In-Reply-To: References: Message-ID: <7o09JnIhYwcYOJhT1zl2necGdXP4bW4rZLmsujrIdmA=.8361d936-5061-43b2-8678-687ef1a94b74@github.com> On Fri, 28 Apr 2023 18:39:21 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/constMethod.cpp line 438: >> >>> 436: } >>> 437: st->cr(); >>> 438: st->print(" - flags: "); _flags.print_on(st); st->cr(); >> >> Method prints its flags as an int and in decoded form, but ConstMethod only prints the decoded form. Any particular reason for this difference? > > No reason for this difference. The only reason I added the int form for MethodFlags was because AccessFlags were printed also with the int form. I fixed the constMethod printing with the last commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13654#discussion_r1180757104 From coleenp at openjdk.org Fri Apr 28 19:59:53 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 28 Apr 2023 19:59:53 GMT Subject: RFR: 8306851: Move Method access flags [v5] In-Reply-To: References: Message-ID: <9mcZrjg-k3wLBxbR3dCguWSBKxZkZJVGtQLsV30bMhI=.e9b2c774-c968-46e4-9b92-3e090edc07d5@github.com> > This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. > > This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. > > Tested with tier1-6, and some manual verification of printing. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix constMethod printing. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13654/files - new: https://git.openjdk.org/jdk/pull/13654/files/9b05ab84..d0de72ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13654&range=03-04 Stats: 6 lines in 3 files changed: 1 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13654.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13654/head:pull/13654 PR: https://git.openjdk.org/jdk/pull/13654 From dcubed at openjdk.org Fri Apr 28 20:54:53 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 28 Apr 2023 20:54:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v67] In-Reply-To: <3Iabuiks5W03nXCOPejWEQAZMz1GqlvaZUmuvs5Bczs=.b8433f00-9394-437f-a7e1-db407bbba983@github.com> References: <3Iabuiks5W03nXCOPejWEQAZMz1GqlvaZUmuvs5Bczs=.b8433f00-9394-437f-a7e1-db407bbba983@github.com> Message-ID: <1gD-39K3koUUdiL3rAs2BmLD-KTN0rZBIM3PacdGa7I=.28f43014-26b8-4538-bac8-9c09fb48c2bc@github.com> On Fri, 28 Apr 2023 19:23:24 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 164 commits: > > - Merge commit '452cb8432f4d45c3dacd4415bc9499ae73f7a17c' into JDK-8291555-v2 > - Fix arm and ppcle builds > - Merge branch 'master' into JDK-8291555-v2 > - Fix formatting > - Suggestios by @dcubed-ojdk > - Suggested changes by @merykitty > - Remove unnecessary comments > - Simple build fix for extra arches > - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 > - A few more LM_ prefixes in 32bit code > - ... and 154 more: https://git.openjdk.org/jdk/compare/452cb843...39b199b6 This project is now baselined on jdk-21+21-1704. I've started Mach5 testing of v66 with stack-locking as the default and Mach5 testing of v66 with a patch that forces fast-locking to be the default. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1528073011 From peter.kessler at os.amperecomputing.com Sat Apr 29 00:18:58 2023 From: peter.kessler at os.amperecomputing.com (Peter Kessler OS) Date: Sat, 29 Apr 2023 00:18:58 +0000 Subject: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch Message-ID: I notice that src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp MacroAssembler::lookup_interface_method loops over the itable list with code that uses two branches: one to check for a null indicating the end of the list, and one to see if the appropriate entry has been found. aarch64 has a "ccmp" instruction that can be used to evaluate two conditions with only one branch. On an out-of-order implementation with more integer execution units than branch units, the trading of a branch for a ccmp can be beneficial. The downside is that one has to check, after the loop has exited, which of the conditions cause the loop to exit, but if the loop executes more than once or twice, that is still a win. There are other opportunities to use cmp;ccmp;br instead of cmp;br;cmp;br. I happened to see the one in MacroAssembler::lookup_interface_method because it was in what passes for hand-written assembler in HotSpot. For generic searches for a key in a key-value array the improvement can be ~10% on a Ampere Altra, depending on how far down the key-value array one has to look. I am only proposing to fix the loop in MacroAssembler::lookup_interface_method, but I would be interested in talking to people about where else the ccmp style could be applied. ... peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From kbarrett at openjdk.org Sat Apr 29 06:43:22 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 29 Apr 2023 06:43:22 GMT Subject: RFR: 8307100: Remove ReferentBasedDiscovery reference discovery policy In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 12:04:08 GMT, Albert Mingkun Yang wrote: > Mostly consisting of mechanic refactoring after replacing `RefDiscoveryPolicy == ...` with `true` or `false`. > > Test: tier1-6 Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13715#pullrequestreview-1406887413 From boris.ulasevich at bell-sw.com Sat Apr 29 07:48:55 2023 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Sat, 29 Apr 2023 13:48:55 +0600 Subject: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch In-Reply-To: References: Message-ID: <19e76652-f635-5ad9-cf84-37e7b87b8adc@bell-sw.com> Hi Peter, Please have a look at JDK-8305959. I'm going to rewrite the itable stub codes to use a single pass over itable! I have an aarch64 implementation which shows improvement on Ampere Altra. Boris On 4/29/2023 6:18 AM, Peter Kessler OS wrote: > > I notice that src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > MacroAssembler::lookup_interface_method loops over the itable list > with code that uses two branches: one to check for a null indicating > the end of the list, and one to see if the appropriate entry has been > found.? aarch64 has a "ccmp" instruction that can be used to evaluate > two conditions with only one branch.? On an out-of-order > implementation with more integer execution units than branch units, > the trading of a branch for a ccmp can be beneficial.? The downside is > that one has to check, after the loop has exited, which of the > conditions cause the loop to exit, but if the loop executes more than > once or twice, that is still a win. > > There are other opportunities to use cmp;ccmp;br instead of > cmp;br;cmp;br.? I happened to see the one in > MacroAssembler::lookup_interface_method because it was in what passes > for hand-written assembler in HotSpot.? For generic searches for a key > in a key-value array the improvement can be ~10% on a Ampere Altra, > depending on how far down the key-value array one has to look. > > I am only proposing to fix the loop in > MacroAssembler::lookup_interface_method, but I would be interested in > talking to people about where else the ccmp style could be applied. > > ... peter > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fyang at openjdk.org Sat Apr 29 11:08:53 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 29 Apr 2023 11:08:53 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v4] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 09:58:53 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused macros Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2034: > 2032: } else { > 2033: __ ld(temp, Address(temp, 0)); > 2034: } This if-else could be further simplified into a single "__ lwu(temp, Address(temp, 0));" since only the low 32 bits of 'temp' register is used by following 'revb_w_w'. src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2065: > 2063: } else { > 2064: __ ld(temp, Address(temp, 0)); > 2065: } Similar here. This if-else could be further simplified into a single "__ lwu(temp, Address(temp, 0));" too. ------------- PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1406968620 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1181053857 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1181054534 From fyang at openjdk.org Sat Apr 29 11:08:56 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 29 Apr 2023 11:08:56 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v3] In-Reply-To: References: <_eZoblPOvxkwGY38lHkL29QyncwAGIOkUNBu-GzDYlY=.f05d9d52-4f26-4064-be79-34276ce6ab38@github.com> Message-ID: On Fri, 28 Apr 2023 09:18:04 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/assembler_riscv.hpp line 1955: >> >>> 1953: target &= ~mask; >>> 1954: target |= val; >>> 1955: sd_c_instr(a, target); >> >> Compressed instructions are supposed to be located at some addresses at least 2-bytes aligned. So I am thinking that there shouldn't be any unaligned access happening here. > > on one hand - you are right, on another one - this way it looks more unified with 4-byte opcodes loads/stores The cost is a runtime check of actual alignment if you go this way. But this should not be the hot path, so I am OK to both ways. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1181050598 From boris.ulasevich at bell-sw.com Sat Apr 29 11:40:36 2023 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Sat, 29 Apr 2023 17:40:36 +0600 Subject: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch In-Reply-To: <19e76652-f635-5ad9-cf84-37e7b87b8adc@bell-sw.com> References: <19e76652-f635-5ad9-cf84-37e7b87b8adc@bell-sw.com> Message-ID: Peter, I tried ccmp as part of improving itable stub on aarch64, and the results were not promising. Applying ccmp as suggested increased geomean from 15.7 ns to 15.9 ns on N1 and from 201 ns to 205 ns on A72. I don't think micro-architecture specialization in itable stub would bring universal benefits, it will only make code more complicated. I would appreciate your review of the AArch64 part of JDK-8305959 once I post it. thanks, Boris On 4/29/2023 1:48 PM, Boris Ulasevich wrote: > Hi Peter, > > Please have a look at JDK-8305959. I'm going to rewrite the itable > stub codes to use a single pass over itable! I have an aarch64 > implementation which shows improvement on Ampere Altra. > > Boris > > On 4/29/2023 6:18 AM, Peter Kessler OS wrote: >> >> I notice that src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> MacroAssembler::lookup_interface_method loops over the itable list >> with code that uses two branches: one to check for a null indicating >> the end of the list, and one to see if the appropriate entry has been >> found.? aarch64 has a "ccmp" instruction that can be used to evaluate >> two conditions with only one branch.? On an out-of-order >> implementation with more integer execution units than branch units, >> the trading of a branch for a ccmp can be beneficial.? The downside >> is that one has to check, after the loop has exited, which of the >> conditions cause the loop to exit, but if the loop executes more than >> once or twice, that is still a win. >> >> There are other opportunities to use cmp;ccmp;br instead of >> cmp;br;cmp;br.? I happened to see the one in >> MacroAssembler::lookup_interface_method because it was in what passes >> for hand-written assembler in HotSpot.? For generic searches for a >> key in a key-value array the improvement can be ~10% on a Ampere >> Altra, depending on how far down the key-value array one has to look. >> >> I am only proposing to fix the loop in >> MacroAssembler::lookup_interface_method, but I would be interested in >> talking to people about where else the ccmp style could be applied. >> >> ... peter >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph-open at littlepinkcloud.com Sun Apr 30 10:06:38 2023 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sun, 30 Apr 2023 11:06:38 +0100 Subject: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch In-Reply-To: References: Message-ID: On 4/29/23 01:18, Peter Kessler OS wrote: > I notice that src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp MacroAssembler::lookup_interface_method loops over the itable list with code that uses two branches: one to check for a null indicating the end of the list, and one to see if the appropriate entry has been found.? aarch64 has a "ccmp" instruction that can be used to evaluate two conditions with only one branch.? On an out-of-order implementation with more integer execution units than branch units, the trading of a branch for a ccmp can be beneficial.? The downside is that one has to check, after the loop has exited, which of the conditions cause the loop to exit, but if the loop executes more than once or twice, that is still a win. I doubt that it'd be a win, but maybe. On out-of-order AArch64 boxes I know, branch prediction tends to be very effective, so it won't make much difference. Also, CCMP is error prone, being difficult to read and write. Unless there's a significant advantage I wouldn't do it. Benchmarking might be hard to do, though. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671