From stuefe at openjdk.org Wed May 1 07:01:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 1 May 2024 07:01:18 GMT Subject: [master] Withdrawn: 8325104: Lilliput: Shrink Classpointers In-Reply-To: References: Message-ID: <2TCZL-x81p6x-Z4L-hTFzicP7Qe8TfjjzdSdUGsVFk4=.d300755b-d577-4d8d-8b6b-111a5e9a6642@github.com> On Thu, 1 Feb 2024 10:23:04 GMT, Thomas Stuefe wrote: > Hi, > > I wanted to get input on the following improvement for Lilliput. Testing is still ongoing, but things look really good, so this patch is hopefully near its final form (barring any objections from reviewers, of course). > > Note: I have a companion patch prepared for upstream, minus the markword changes. I will attempt to get that one upstream quickly in order to not have a large delta between upstream and lilliput, especially in Metaspace. > > ## High-Level Overview > > (for a short sequence of slides, please see https://github.com/tstuefe/fosdem24/blob/master/classpointers-and-liliput.pdf - these accompanied a talk we held at FOSDEM 24). > > We want to reduce the bit size of narrow Klass to free up bits in the MarkWord. > > We cannot just reduce the Klass encoding range size (well, we could, and maybe we will later, but for now we decided not to). We instead increase the alignment Klass is stored at, and use that alignment shadow to store other information. > > In other words, this patch changes the narrow Klass Pointer to a Klass ID, since now (almost) every value in its value range points to a different class. Therefore, we use the value range of nKlass much more efficiently. > > We then use the newly freed bits in the MarkWord to restore the iHash to 31 bits: > > > [ 22-bit nKlass | 31-bit iHash | 4 free bits | age | fwd | lck ] > > nKlass gets reduced to 22 bits. Identity hash gets re-inflated to 31 bits. Preceding iHash are now 4 unused bits. Rest is unchanged. > > (Note: I originally wanted to swap iHash and nKlass such that either of them could be loaded with a 32-bit load, but I found that tricky since C2 seems to rely on the nKlass offset in the Markword being > 0.) > > ## nKlass reduction: > > The reduction in nKlass size is made by only storing them at 10-bit aligned addresses. That alignment (1KB) works well in practice since Klass - although var sized - typically is between 512 bytes and 1KB in size. Outliers are possible, but the size distribution is bell-curvish [1], so far-away outliers are very rare. > > To not lose memory to alignment waste, metaspace is reshaped to handle arbitrarily aligned allocations efficiently. Basically, we allow the non-Klass arena of a class loader to steal the alignment waste storage from the class arena. So, alignment waste blocks are filled with non-Klass metadata. That works very well in practice since non-Klass metadata is numerous and fine-granular compared to the big Klass blocks. Total footprint loss in metaspace is, therefore, almost ... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/lilliput/pull/128 From rkennke at openjdk.org Thu May 2 10:53:20 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 2 May 2024 10:53:20 GMT Subject: [lilliput-jdk21u:lilliput] Integrated: 8330849: Add test to verify memory usage with recursive locking In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 16:27:04 GMT, Roman Kennke wrote: > Hi all, > > This pull request contains a backport of commit [7b2560b4](https://github.com/openjdk/jdk/commit/7b2560b4904d80629d3f4f25c65d9b96eee9bdb6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roman Kennke on 24 Apr 2024 and was reviewed by Leonid Mesnik and Aleksey Shipilev. > > Thanks! This pull request has now been integrated. Changeset: 20dbbeaf Author: Roman Kennke URL: https://git.openjdk.org/lilliput-jdk21u/commit/20dbbeafc702bdfdf9f51d83f09564e71b66dd25 Stats: 95 lines in 1 file changed: 95 ins; 0 del; 0 mod 8330849: Add test to verify memory usage with recursive locking Backport-of: 7b2560b4904d80629d3f4f25c65d9b96eee9bdb6 ------------- PR: https://git.openjdk.org/lilliput-jdk21u/pull/32 From rkennke at openjdk.org Tue May 14 10:57:45 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 May 2024 10:57:45 GMT Subject: [master] RFR: Exclude some tests from Lilliput which rely on array alignment Message-ID: On x86_64, some IR tests are failing with +UCOH, because they expect a certain array alignment. Let's not run those tests with +UCOH, just like we do with other similar tests. Testing: - [x] compiler/loopopts/superword/TestMulAddS2I.java ------------- Commit messages: - Exclude some tests from Lilliput which rely on array alignment Changes: https://git.openjdk.org/lilliput/pull/173/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=173&range=00 Stats: 15 lines in 1 file changed: 10 ins; 0 del; 5 mod Patch: https://git.openjdk.org/lilliput/pull/173.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/173/head:pull/173 PR: https://git.openjdk.org/lilliput/pull/173 From shade at openjdk.org Wed May 15 09:40:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 15 May 2024 09:40:28 GMT Subject: [master] RFR: Exclude some tests from Lilliput which rely on array alignment In-Reply-To: References: Message-ID: On Tue, 14 May 2024 10:52:39 GMT, Roman Kennke wrote: > On x86_64, some IR tests are failing with +UCOH, because they expect a certain array alignment. Let's not run those tests with +UCOH, just like we do with other similar tests. > > Testing: > - [x] compiler/loopopts/superword/TestMulAddS2I.java Looks fine. Have you tried to run that test without opts, and TEST_VM_OPTS=+|-UCOH? ------------- Marked as reviewed by shade (Committer). PR Review: https://git.openjdk.org/lilliput/pull/173#pullrequestreview-2057462865 From rkennke at openjdk.org Thu May 16 08:39:31 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 May 2024 08:39:31 GMT Subject: [master] RFR: Exclude some tests from Lilliput which rely on array alignment In-Reply-To: References:

Message-ID: On Wed, 15 May 2024 09:37:31 GMT, Aleksey Shipilev wrote: > Looks fine. Have you tried to run that test without opts, and TEST_VM_OPTS=+|-UCOH? Thanks! Yes, all combinations are passing. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/173#issuecomment-2114497735 From rkennke at openjdk.org Thu May 16 09:48:20 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 May 2024 09:48:20 GMT Subject: [master] Integrated: Exclude some tests from Lilliput which rely on array alignment In-Reply-To: References: Message-ID: <2cOve-mwFh3jJn6AA7e6HQ3xtVD6fUDK74wQbUKPeJw=.b1f92fb5-ee13-479f-b93a-a2e15d63640b@github.com> On Tue, 14 May 2024 10:52:39 GMT, Roman Kennke wrote: > On x86_64, some IR tests are failing with +UCOH, because they expect a certain array alignment. Let's not run those tests with +UCOH, just like we do with other similar tests. > > Testing: > - [x] compiler/loopopts/superword/TestMulAddS2I.java > - [x] compiler/loopopts/superword/TestMulAddS2I.java (-UCOH) > - [x] compiler/loopopts/superword/TestMulAddS2I.java (+UCOH) This pull request has now been integrated. Changeset: 75b9c85a Author: Roman Kennke URL: https://git.openjdk.org/lilliput/commit/75b9c85ad332c70a225398d14f9645aed35bece7 Stats: 15 lines in 1 file changed: 10 ins; 0 del; 5 mod Exclude some tests from Lilliput which rely on array alignment Reviewed-by: shade ------------- PR: https://git.openjdk.org/lilliput/pull/173 From aboldtch at openjdk.org Thu May 23 06:30:52 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 May 2024 06:30:52 GMT Subject: [master] RFR: OMWorld: reenable all platforms Message-ID: This reenables all platforms to use OMWorld, and by extension UseCompactObjectHeaders. This change simply calls the runtime if a lock is inflated, until port support for OMWorld cache lookup is added. ARM (32-bit) required no changes as it already always called the runtime when a monitor is inflated. ------------- Commit messages: - Fast lock in quick_enter - Reenable arm/riscv/ppc/s390 Changes: https://git.openjdk.org/lilliput/pull/174/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=174&range=00 Stats: 204 lines in 5 files changed: 32 ins; 125 del; 47 mod Patch: https://git.openjdk.org/lilliput/pull/174.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/174/head:pull/174 PR: https://git.openjdk.org/lilliput/pull/174 From aboldtch at openjdk.org Thu May 23 06:37:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 May 2024 06:37:23 GMT Subject: [master] RFR: OMWorld: Decouple deflation and table sizing Message-ID: The change reverts all changes to deflation and moves the resizing of the OMWorld ConcurrentHashTable to the service thread. Using a similar logic to how we resize the Symbol- and StringTables. The option to shrink the table is taken out and can be reintroduced at a later date as an enhancement. To do it correctly the interactions with deflation needs to be figured out. ------------- Commit messages: - Decouple deflation and table sizing Changes: https://git.openjdk.org/lilliput/pull/175/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=175&range=00 Stats: 206 lines in 7 files changed: 65 ins; 91 del; 50 mod Patch: https://git.openjdk.org/lilliput/pull/175.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/175/head:pull/175 PR: https://git.openjdk.org/lilliput/pull/175 From aboldtch at openjdk.org Thu May 23 06:41:49 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 May 2024 06:41:49 GMT Subject: [master] RFR: OMWorld: Remove OMRecursiveFastPath Message-ID: The `OMRecursiveFastPath` was an experiment that was introduced when recursive lightweight was developed. It could show some gains in some scenarios on specific hardware, but remove it for now. Checking for recursion via a failed CAS that reads out the owner is good enough. ------------- Commit messages: - Remove OMRecursiveFastPath Changes: https://git.openjdk.org/lilliput/pull/176/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=176&range=00 Stats: 32 lines in 3 files changed: 0 ins; 26 del; 6 mod Patch: https://git.openjdk.org/lilliput/pull/176.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/176/head:pull/176 PR: https://git.openjdk.org/lilliput/pull/176 From aboldtch at openjdk.org Thu May 23 06:58:35 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 May 2024 06:58:35 GMT Subject: [master] RFR: OMWorld: Spin Changes Message-ID: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) ------------- Commit messages: - Only spin when os::is_MP() - Spin Changes Changes: https://git.openjdk.org/lilliput/pull/177/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=177&range=00 Stats: 36 lines in 2 files changed: 24 ins; 6 del; 6 mod Patch: https://git.openjdk.org/lilliput/pull/177.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/177/head:pull/177 PR: https://git.openjdk.org/lilliput/pull/177 From aboldtch at openjdk.org Thu May 23 07:03:18 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 May 2024 07:03:18 GMT Subject: [master] RFR: OMWorld: Cleanups Message-ID: This contains a handful of miscellaneous cleanups. Removing dead code, fixes/cleanups todos, cleanup logging and removing all special handling of `x86_32` not having a thread register / lacking registers. ------------- Commit messages: - Fix logging - Cleanup / polish Changes: https://git.openjdk.org/lilliput/pull/178/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=178&range=00 Stats: 89 lines in 14 files changed: 5 ins; 66 del; 18 mod Patch: https://git.openjdk.org/lilliput/pull/178.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/178/head:pull/178 PR: https://git.openjdk.org/lilliput/pull/178 From aboldtch at openjdk.org Thu May 23 14:34:37 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 May 2024 14:34:37 GMT Subject: [master] RFR: OMWorld: Cleanups [v2] In-Reply-To: References: Message-ID: > This contains a handful of miscellaneous cleanups. Removing dead code, fixes/cleanups todos, cleanup logging and removing all special handling of `x86_32` not having a thread register / lacking registers. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: LockStackInflateContendedLocks only used for current thread ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/178/files - new: https://git.openjdk.org/lilliput/pull/178/files/9d01cafd..944d015a Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=178&range=01 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=178&range=00-01 Stats: 6 lines in 1 file changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/lilliput/pull/178.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/178/head:pull/178 PR: https://git.openjdk.org/lilliput/pull/178 From thomas.stuefe at gmail.com Thu May 23 15:20:21 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 23 May 2024 17:20:21 +0200 Subject: Solving the Klass hyperalignment problem Message-ID: Hi all, I would like help deciding on the best mitigation strategy for Lilliput's Klass hyperalignment problem. Since it has wide effects (e.g. a possible removal of class space), I'd like to base the next steps on consensus. (a more readable version of this, in markdown, is here: https://gist.github.com/tstuefe/6d8c4a40689c34b12f79442a8469504e). 1. Background We store class information in Klass, and resolving Klass from oop is a hot path. One example is GC: During GCs, we churn through tons of objects and need to get at least object size (layout helper) and Oopmap from Klass frequently. Therefore, the way we resolve a nKlass matters for performance. Today (non-Lilliput), we go from Object to Klass (with compressed class pointers) this way: We pluck the nKlass from the word adjacent to the MW. We then calculate Klass* from nKlass by - typically - just adding the encoding base as immediate. We may or may not omit that add, and we may or may not shift the nKlass, but the most typical case (CDS enabled) is just the addition. Today's decoding does not need a single memory access, it can happen in registers only. In Lilliput, the nKlass lives in the MW (which allows us to load nKlass with the same load that loads the MW). Therefore, nKlass needs to shrink. The problem with the classic 32-bit nKlass is that its value range is not used effectively. Klass structures tend to be large, on average 500-700 bytes [1], and that means a lot of values in that 32-bit range are "blind" - point into the middle of a class - and are hence wasted. Ideally, one would want to use one nKlass value per class. In Lilliput, we reduced nKlass to 22-bit. We do this by placing Klass structures only on 1KB-aligned addresses. Therefore, the lower 10 bits are 0 and can be shifted out. 1KB was chosen as a middle-ground that allows us to use both the nKlass value range and the Klass encoding range (4GB) effectively. 2. The Problem By keeping Klass structures 1KB-aligned, we march in lockstep with respect to CPU caches. With a cache line size of 64 bytes (6 bits) and an alignment of 1KB (10 bits), we lose 4 bits of entropy. Therefore, loads from a Klass structure have a high chance of evicting earlier loads from other Klass structures. Depending on saturation and number of cache ways, we may only use a 16th of the caches. We see this effect clearly, especially in GC pause times. The bad cache behavior is clearly noticeable and needs to be solved. 3. The solutions 3.1. Short-term mitigation: Increasing the nKlass size to 26 bits A simple short-term mitigation for Lilliput Milestone 1 is just increasing the nKlass size. By reducing the nKlass size to 22 bits, we freed up 10 bits, but to date, we only use 6 of them. For now, we have 4 spare bits in the header. We could increase the nKlass to 26 bits and work with a shift of 6 bits instead of 10 bits. That would require no additional work. Klass would be aligned to just 64 bytes, so the cache performance problems would disappear. Note, however, that we earmarked those 4 spare bits for Valhalla's use. Therefore, reverting to a 26-bit nKlass can only be an intermediate step. And, obviously, it won't do for 32-bit headers. 3.2 Use a pointer indirection table This idea resurfaces every couple of years. The idea is to replace the class space with an indirection pointer table. In this approach, a nKlass would be an index into a global pointer table, and that pointer table contains the real Klass* pointers. The enticing part of this approach is that we could throw away the class space and a bunch of supporting code. Klass structures could live in normal Metaspace like all the other data. However, we would need some new code to maintain the global Klass* table, recycle empty pointer slots after class unloading, etc. Decoding a nKlass would mean: - load the nKlass from the object - load Klass* from the indirection table at the index nKlass points to The approach would solve the cache problem described above since removing any alignment requirement from Klass structures allows us to place them wherever we like (e.g., in standard Metaspace), and their start addresses would not march in lockstep. However, it introduces a new cache problem since we now have a new load in the hot decoding path. And the Klasspointer table can only be improved so much: only 8 uncompressed pointers fit into a cache line. From a certain number of classes, subsequent table accesses will have little spatial locality. 3.3 Place Klass on alternating cache lines Originally brought up by John Rose [2] when we did the first iteration of 22-bit class pointers in Lilliput. The idea is to alter the locations of Klass structures by cache line size. There is nothing that forces us to use a power-of-two stride. We can use any uneven multiple of cache lines that we like. For example, 11 cache lines (704 bytes) would mean that Klass structures would come to be located on different cache lines. With a non-pow2 stride, decoding becomes a bit more complex. We cannot use shift, we need to do integer multiplication: - multiply nKlass with 704 - add base. But all of this can still happen in registers only. No memory load is needed. 4. The prototypes I wanted to measure the performance impacts of all approaches. So I compared four JVMs: - A) (the unmitigated case) a Lilliput JVM as it is today: 22-bit nKlass, 10-bit shift - B) a Lilliput JVM that uses a 26-bit nKlass with a 6-bit shift - C) a Lilliput JVM that uses a 22-bit nKlass and a Klass pointer indirection table [3] - D) a Lilliput JVM that uses a 22-bit nKlass and a non-pow2 alignment of Klass of 704 [4] It turned out that Coleen also wrote a prototype with a Klass pointer indirection table [5], but that is identical to mine (C) in all relevant points. The only difference is that Coleen based it on the mainline JVM; mine is based on Lilliput. But I repeated all my tests with Coleen's prototype. 5. The Tests I did both SpecJBB2015 and a custom-written Microbenchmark [6]. The microbenchmark was designed to stress Object-to-Klass dereferencing during GC. It fills the heap with many objects of randomly chosen classes. It keeps those objects alive in an array. It then executes several Full GCs and sums up all GC pause times. Walking these objects forces the GCs to de-reference many different nKlass values. The Microbenchmark was run on a Ryzen Zen 2, the SpecJBB on an older i7-4770, and Coleen's prototype I also tested on a Raspberry 4. The tests were isolated to 8 cores (well, apart from the Raspberry), and I tried to minimize scheduler interference. The microbenchmark results were pretty stable, but the SpecJBB2015 results fluctuated - despite my attempts at stabilizing them. I repeated the Microbenchmark for a number of classes (512..16384) and three different collectors (Serial, Parallel, G1). 6. The Results The microbenchmark shows clear and stable results. SpecJBB fluctuated more. 6.1 Microbenchmark, G1GC See graph [7]. (A) - the unmitigated 22-bit version showed overall the worst performance, with a 41% increase over the best performer (B, the 26-bit version) at 16k classes. (B) - best performance (C) - the klass pointer table seemed overall the worst of the three mitigation prototypes. For 4k..8k classes, even worse than the unmitigated case (A). Maxes out at +36% over the best performer (B) at 16k classes. (D) - second best performance, for certain class ranges even best. Maxes out at +11% at 16k classes. I wondered why (D) could be better than (B). My assumption is that with (D), we go out of our way to choose different cache lines. With (B), the cache line chosen is "random" and may be subject to allocation pattern artifacts of the underlying allocator. 6.2 Microbenchmark, ParallelGC See graphs [8]. Differences are less pronounced, but the results are similar. (B) better than (D) better than (C). 6.3 Microbenchmark, SerialGC See graphs [9]. Again, the same result. Here are the deltas most pronounced, cache inefficiency measured via GC pauses is the most apparent. 6.4 Coleen's prototype I repeated the measurements with Coleens prototype (with G1), comparing it against the same JVM with klass table switched off. No surprises, similar behavior to (C) vs (B). See [10]. I also did a run with perf to measure L1 misses, and we see up to 32% more L1 cache misses with the klass pointer table [11]. 6.5 SPecJBB2015 SpecJBB results were quite volatile despite my stabilization efforts. The deltas between maxJOps and critJOps did not rise above random noise. The GC pause times showed (Percentage numbers, compared with (A)==100%): | Run | 1 | 2 | 3 | |-----|---------|---------|--------| | B | 108.68% | 95.88% | 97.01% | | C | 104.26% | 92.15% | 90.25% | | D | 96.20% | 94.31% | 86.44% | (all with G1GC) Again, (D) seems to perform best. I am unsure what the problem is with (B) here (the 26-bit class pointer approach) since it seems to perform worst in all cases. I will look into that. 7. Side considerations 7.1 Running out of class space/addressable classes? This issue does not affect the number of addressable classes much. That one is limited by the size of nKlass. With a 22-bit nKlass that is 4 mio. We can only work beyond that with the concept of near- vs far classes suggested by John Rose. In any case, that is not the focus here. The class-space-based approaches (A), (B), and (D) also have a soft limit in that the number of Klass we can store is limited by the size of the encoding range (4GB). However, that is a rather soft limit because no hard technical reason prevents us from having a larger encoding range. The 4G limitation exists only because we optimize the addition of the base immediate by using e.g. 16-bit moves on some platforms, so the nKlass must not extend beyond bit 31. We may be able to do that differently. Note that 4GB is also really large for Klass data. 7.2 Reducing the number of Klass structures addressable via nKlass Coleen had a great idea that never instantiated classes (e.g., Lambda Forms) don't need a nKlass at all. They, therefore, don't have to live within the Klass encoding range. That would be a great improvement since these classes are typically generated, and their number is unpredictable. By removing this kind of classes from the equation, the question of number-of-addressable classes becomes a lot more relaxed. 8. Conclusions For the moment, I prefer (D) (the uneven-cache-lines-approach). It shows the overall best performance, in parts even outperforming the 26-bit approach. The Klasspointer-Table approach (C) would be nice since we could eliminate class space and a lot of coding that goes with it. That would reduce complexity. But the additional load in hot decoding paths hurts. There is also the vague fear of not being future-proof. I am apprehensive about sacrificing Klass resolving performance since Klass lookup seems to be something we will always do. That said, all input (especially from you, Coleen!) is surely welcome. # Materials All test results, tests, etc can be found here: [12] Thanks, Thomas - [1] [Allocation Histogram for Klass- and Non-Klass Allocations]( https://raw.githubusercontent.com/tstuefe/metaspace-statistics/ab3625e041d42243039f37983969ac8b770a9f4a/Histogram.svg ) - [2] https://github.com/openjdk/lilliput/pull/13#issuecomment-988456995 - [3] [Klasstable prototype]( https://github.com/tstuefe/lilliput/tree/lilliput-with-Klass-indirection-table ) - [4] [Uneven alignment prototype]( https://github.com/tstuefe/lilliput/tree/lilliput-with-staggered-Klass-alignment ) - [5] [Klass table prototype, Coleen]( https://github.com/openjdk/jdk/pull/19272) - [6] [The Microbenchmark]( https://github.com/tstuefe/test-hyperaligning-lilliput/blob/master/microbenchmark/the-test/src/main/java/de/stuefe/repros/metaspace/InterleaveKlassRefsInHeap.java ) - [7] [Graph: G1 microbench, absolute GC pauses]( https://raw.githubusercontent.com/tstuefe/test-hyperaligning-lilliput/e55025622ec3574b0252ff82d860e63603a9f7df/microbenchmark/archived/results-2024-05-14T15-36-27-CEST/G1GC-abs-pauses.svg ) - [8] [Graph: ParallelGC microbench, absolute GC pauses]( https://raw.githubusercontent.com/tstuefe/test-hyperaligning-lilliput/e55025622ec3574b0252ff82d860e63603a9f7df/microbenchmark/archived/results-2024-05-14T15-36-27-CEST/ParallelGC-abs-pauses.svg ) - [9] [Graph: SerialGC microbench, absolute GC pauses]( https://raw.githubusercontent.com/tstuefe/test-hyperaligning-lilliput/e55025622ec3574b0252ff82d860e63603a9f7df/microbenchmark/archived/results-2024-05-14T15-36-27-CEST/SerialGC-abs-pauses.svg ) - [10] [Graph: G1 microbench, absolute GC pauses, alternate klasstable prototype]( https://raw.githubusercontent.com/tstuefe/test-hyperaligning-lilliput/6fb9d64fff930e16154c4b3fab9a24169b021de8/microbenchmark/archived/coleen-kptable-results-2024-05-21T11-14-31-CEST/g1gc-pauses-absolute.svg ) - [11] [Graph: G1 microbench, L1 misses, alternate klasstable prototype]( https://raw.githubusercontent.com/tstuefe/test-hyperaligning-lilliput/6fb9d64fff930e16154c4b3fab9a24169b021de8/microbenchmark/archived/coleen-kptable-results-2024-05-21T11-14-31-CEST/g1gc-l1misses-absolute.svg ) - [12] https://github.com/tstuefe/test-hyperaligning-lilliput/tree/master -------------- next part -------------- An HTML attachment was scrubbed... URL: From coleenp at openjdk.org Thu May 23 18:59:15 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 23 May 2024 18:59:15 GMT Subject: [master] RFR: OMWorld: reenable all platforms In-Reply-To: References: Message-ID: On Thu, 23 May 2024 06:26:20 GMT, Axel Boldt-Christmas wrote: > This reenables all platforms to use OMWorld, and by extension UseCompactObjectHeaders. > > This change simply calls the runtime if a lock is inflated, until port support for OMWorld cache lookup is added. > > ARM (32-bit) required no changes as it already always called the runtime when a monitor is inflated. This is sort of awkward because right now, Lightweight locking is default, so going forward, if we put the OMWorld table on a switch, we'll have to revert this code. Would it be possible for this Lilliput patch to conditionalize the code on the UseCompactObjectHeaders instead? ------------- Changes requested by coleenp (Committer). PR Review: https://git.openjdk.org/lilliput/pull/174#pullrequestreview-2074778988 From coleenp at openjdk.org Thu May 23 19:27:23 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 23 May 2024 19:27:23 GMT Subject: [master] RFR: OMWorld: Decouple deflation and table sizing In-Reply-To: References: Message-ID: <7Jba4iymkB24XIx1Iz1C6Sd6fLdHelnqZBDm4cVwyyE=.ea72570d-cb04-4e7c-8c2e-2fea20f0b83a@github.com> On Thu, 23 May 2024 06:32:28 GMT, Axel Boldt-Christmas wrote: > The change reverts all changes to deflation and moves the resizing of the OMWorld ConcurrentHashTable to the service thread. Using a similar logic to how we resize the Symbol- and StringTables. > > The option to shrink the table is taken out and can be reintroduced at a later date as an enhancement. To do it correctly the interactions with deflation needs to be figured out. This does simplify the table to act like the symbol and string tables, until we find we need to do something more complicated. src/hotspot/share/runtime/serviceThread.cpp line 205: > 203: if (omworldtable_work) { > 204: LightweightSynchronizer::resize_table(jt); > 205: } The monitor deflation thread seems like it would be a better place to do the OMWorld table resizing, because it also does cleaning that might be a response to monitor deflation. The ServiceThread does a lot, and the OMWorld table might notice the stall. The other thread is also a service thread in a way, so I think this could be moved there instead. ------------- PR Review: https://git.openjdk.org/lilliput/pull/175#pullrequestreview-2074817118 PR Review Comment: https://git.openjdk.org/lilliput/pull/175#discussion_r1612193671 From coleenp at openjdk.org Thu May 23 19:27:23 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 23 May 2024 19:27:23 GMT Subject: [master] RFR: OMWorld: Decouple deflation and table sizing In-Reply-To: <7Jba4iymkB24XIx1Iz1C6Sd6fLdHelnqZBDm4cVwyyE=.ea72570d-cb04-4e7c-8c2e-2fea20f0b83a@github.com> References: <7Jba4iymkB24XIx1Iz1C6Sd6fLdHelnqZBDm4cVwyyE=.ea72570d-cb04-4e7c-8c2e-2fea20f0b83a@github.com> Message-ID: On Thu, 23 May 2024 19:18:17 GMT, Coleen Phillimore wrote: >> The change reverts all changes to deflation and moves the resizing of the OMWorld ConcurrentHashTable to the service thread. Using a similar logic to how we resize the Symbol- and StringTables. >> >> The option to shrink the table is taken out and can be reintroduced at a later date as an enhancement. To do it correctly the interactions with deflation needs to be figured out. > > src/hotspot/share/runtime/serviceThread.cpp line 205: > >> 203: if (omworldtable_work) { >> 204: LightweightSynchronizer::resize_table(jt); >> 205: } > > The monitor deflation thread seems like it would be a better place to do the OMWorld table resizing, because it also does cleaning that might be a response to monitor deflation. The ServiceThread does a lot, and the OMWorld table might notice the stall. The other thread is also a service thread in a way, so I think this could be moved there instead. And when we do add shrinking, there'll be interactions with things that the monitor deflation thread is doing. ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/175#discussion_r1612194388 From coleenp at openjdk.org Thu May 23 19:29:14 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 23 May 2024 19:29:14 GMT Subject: [master] RFR: OMWorld: Remove OMRecursiveFastPath In-Reply-To: References: Message-ID: On Thu, 23 May 2024 06:37:53 GMT, Axel Boldt-Christmas wrote: > The `OMRecursiveFastPath` was an experiment that was introduced when recursive lightweight was developed. It could show some gains in some scenarios on specific hardware, but remove it for now. Checking for recursion via a failed CAS that reads out the owner is good enough. This changes the lilliput code to match mainline here, right? ------------- Marked as reviewed by coleenp (Committer). PR Review: https://git.openjdk.org/lilliput/pull/176#pullrequestreview-2074829775 From coleenp at openjdk.org Thu May 23 19:41:18 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 23 May 2024 19:41:18 GMT Subject: [master] RFR: OMWorld: Spin Changes In-Reply-To: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> Message-ID: On Thu, 23 May 2024 06:54:51 GMT, Axel Boldt-Christmas wrote: > The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. > > This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. > > Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) src/hotspot/share/runtime/lightweightSynchronizer.cpp line 662: > 660: } > 661: > 662: mark = obj()->mark(); Is this the spinning to delay creating an ObjectMonitor for this lock as long as possible? Or if deflation is observed? This seems like it'd slow down the contended case, like xalan where we want a lot of the threads to quickly park. Or is this spinning because deflation is observed (like the comment says above)? Can you make 647-660 a function above this so it doesn't distract from the logic of the rest of enter so much? ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1612216078 From coleenp at openjdk.org Thu May 23 19:49:29 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 23 May 2024 19:49:29 GMT Subject: [master] RFR: OMWorld: Cleanups [v2] In-Reply-To: References:

Message-ID: <8av3jUwoRMZ3W2AoDNT55pW8RQYptLjcXxHSNKs3Ib4=.362fa40d-5ca4-495c-92a1-0efe5e51bb98@github.com> On Thu, 23 May 2024 14:34:37 GMT, Axel Boldt-Christmas wrote: >> This contains a handful of miscellaneous cleanups. Removing dead code, fixes/cleanups todos, cleanup logging and removing all special handling of `x86_32` not having a thread register / lacking registers. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > LockStackInflateContendedLocks only used for current thread These cleanups look good. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 502: > 500: ldr(t1_monitor, Address(box, BasicLock::object_monitor_cache_offset_in_bytes())); > 501: // null check with Flags == NE, no valid pointer below alignof(ObjectMonitor*) > 502: cmp(t1_monitor, checked_cast(alignof(ObjectMonitor*))); I don't know what this means. ------------- Marked as reviewed by coleenp (Committer). PR Review: https://git.openjdk.org/lilliput/pull/178#pullrequestreview-2074868422 PR Review Comment: https://git.openjdk.org/lilliput/pull/178#discussion_r1612224392 From aboldtch at openjdk.org Fri May 24 07:59:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 24 May 2024 07:59:22 GMT Subject: [master] RFR: OMWorld: reenable all platforms In-Reply-To: References:

Message-ID: On Thu, 23 May 2024 18:56:13 GMT, Coleen Phillimore wrote: > This is sort of awkward because right now, Lightweight locking is default, so going forward, if we put the OMWorld table on a switch, we'll have to revert this code. Would it be possible for this Lilliput patch to conditionalize the code on the UseCompactObjectHeaders instead? A UseCompactObjectHeaders condition will not work as you can still run LM_LIGHTWEIGHT with -UseCompactObjectHeaders. And that would crash. I was thinking this could be done when introducing the OMWorld flag. (And simply reintroduce the code this removes). The alternative is that I wait with this patch until you have introduced the UseOMWorld flag. And I can then update this to instead of removing code, condition the runtime call on the OMWorld flag. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/174#issuecomment-2128841128 From aboldtch at openjdk.org Fri May 24 15:08:45 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 24 May 2024 15:08:45 GMT Subject: [master] RFR: OMWorld: Spin Changes [v2] In-Reply-To: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> Message-ID: > The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. > > This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. > > Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: - Cancel spinning early in case of inflation - Renamed first_time variable - Extract fast_lock_spin_enter ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/177/files - new: https://git.openjdk.org/lilliput/pull/177/files/e0001589..b021a26a Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=177&range=01 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=177&range=00-01 Stats: 92 lines in 2 files changed: 50 ins; 37 del; 5 mod Patch: https://git.openjdk.org/lilliput/pull/177.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/177/head:pull/177 PR: https://git.openjdk.org/lilliput/pull/177 From aboldtch at openjdk.org Fri May 24 15:08:46 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 24 May 2024 15:08:46 GMT Subject: [master] RFR: OMWorld: Spin Changes [v2] In-Reply-To: References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> Message-ID: On Thu, 23 May 2024 19:38:56 GMT, Coleen Phillimore wrote: > Is this the spinning to delay creating an ObjectMonitor for this lock as long as possible? Or if deflation is observed? This seems like it'd slow down the contended case, like xalan where we want a lot of the threads to quickly park. Or is this spinning because deflation is observed (like the comment says above)? There are two spinning loops here. The outer which uses the SpinYield which is only to do with deflation. And the inner when the object is fast_lock. But the first_time variable has a bad name. And with the new exponential backoff changes the spinning should be interrupted if inflation is observed. > Can you make 647-660 a function above this so it doesn't distract from the logic of the rest of enter so much? Good suggestion. Becomes much cleared. Extracted the spinning logic, rename the variables and cleaned up the loop condition. ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1613626778 From aboldtch at openjdk.org Fri May 24 15:09:20 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 24 May 2024 15:09:20 GMT Subject: [master] RFR: OMWorld: Cleanups [v2] In-Reply-To: <8av3jUwoRMZ3W2AoDNT55pW8RQYptLjcXxHSNKs3Ib4=.362fa40d-5ca4-495c-92a1-0efe5e51bb98@github.com> References:

<8av3jUwoRMZ3W2AoDNT55pW8RQYptLjcXxHSNKs3Ib4=.362fa40d-5ca4-495c-92a1-0efe5e51bb98@github.com> Message-ID: On Thu, 23 May 2024 19:46:58 GMT, Coleen Phillimore wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> LockStackInflateContendedLocks only used for current thread > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 502: > >> 500: ldr(t1_monitor, Address(box, BasicLock::object_monitor_cache_offset_in_bytes())); >> 501: // null check with Flags == NE, no valid pointer below alignof(ObjectMonitor*) >> 502: cmp(t1_monitor, checked_cast(alignof(ObjectMonitor*))); > > I don't know what this means. Maybe I should say `valid ObjectMonitor* check` instead of `null check`. The reason that the check used to be against `2` is because some iterations of the code have signalled information in the lowest bit. Currently it could just be a check agains `1` which would check for `nullptr` while also setting Flags to NE. The check is simply that based on the fact that no valid ObjectMonitor* except nullptr can exist below `alignof(ObjectMonitor*)`. ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/178#discussion_r1613627124 From coleenp at openjdk.org Fri May 24 17:55:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 24 May 2024 17:55:20 GMT Subject: [master] RFR: OMWorld: Cleanups [v2] In-Reply-To: References:

<8av3jUwoRMZ3W2AoDNT55pW8RQYptLjcXxHSNKs3Ib4=.362fa40d-5ca4-495c-92a1-0efe5e51bb98@github.com> Message-ID: On Fri, 24 May 2024 15:06:20 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 502: >> >>> 500: ldr(t1_monitor, Address(box, BasicLock::object_monitor_cache_offset_in_bytes())); >>> 501: // null check with Flags == NE, no valid pointer below alignof(ObjectMonitor*) >>> 502: cmp(t1_monitor, checked_cast(alignof(ObjectMonitor*))); >> >> I don't know what this means. > > Maybe I should say `valid ObjectMonitor* check` instead of `null check`. > > The reason that the check used to be against `2` is because some iterations of the code have signalled information in the lowest bit. > > Currently it could just be a check agains `1` which would check for `nullptr` while also setting Flags to NE. > > The check is simply that based on the fact that no valid ObjectMonitor* except nullptr can exist below `alignof(ObjectMonitor*)`. But in the current code now, the field should be either nullptr or a valid ObjectMonitor, right? If you simply checked against nullptr, would it set the NE flags? All the extra casting etc makes this a puzzler where testing against nullptr does what it says. ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/178#discussion_r1613831179 From coleenp at openjdk.org Fri May 24 18:15:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 24 May 2024 18:15:13 GMT Subject: [master] RFR: OMWorld: Spin Changes [v2] In-Reply-To: References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> Message-ID: On Fri, 24 May 2024 15:08:45 GMT, Axel Boldt-Christmas wrote: >> The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. >> >> This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. >> >> Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) > > Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: > > - Cancel spinning early in case of inflation > - Renamed first_time variable > - Extract fast_lock_spin_enter Some small comments requested, please, and to help check my understanding of this code. Thank you. src/hotspot/share/runtime/globals.hpp line 2001: > 1999: product(bool, OMShrinkCHT, false, "") \ > 2000: \ > 2001: product(int, OMSpins, 13, "") \ Can you add a comment for now, even if we move this somewhere internal (as a Knob_OMSpin??) src/hotspot/share/runtime/lightweightSynchronizer.cpp line 684: > 682: while (true) { > 683: // Fast-locking does not use the 'lock' argument. > 684: if (fast_lock_spin_enter(obj(), current, observed_deflation)) { So here the lock_stack doesn't contain the object but we're trying to spin in order to stall creating or fetching an ObjectMonitor for this lock. Is that right? Can you say why this is. It seems to spin before testing whether it can get the lock. Is that because the caller already tried and found the object locked? The first_time => observed_deflation rename is good. That helps a lot because reading this, you don't expect to have to wait. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 698: > 696: } > 697: > 698: observed_deflation = true; I see now. Can you comment that if inflate_and_enter fails, it means that deflation was observed is why? ------------- PR Review: https://git.openjdk.org/lilliput/pull/177#pullrequestreview-2077540175 PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1613833913 PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1613841296 PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1613845118 From coleenp at openjdk.org Fri May 24 18:15:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 24 May 2024 18:15:13 GMT Subject: [master] RFR: OMWorld: Spin Changes [v2] In-Reply-To: References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com>

Message-ID: On Fri, 24 May 2024 15:06:03 GMT, Axel Boldt-Christmas wrote: > backoff changes the spinning should be interrupted if inflation is observed. Do you mean deflation? Also if a safepoint is observed too? Edit: no, right, you mean inflation by some other thread (mark.has_monitor()). ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1613849648 From coleenp at openjdk.org Fri May 24 18:18:18 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 24 May 2024 18:18:18 GMT Subject: [master] RFR: OMWorld: reenable all platforms In-Reply-To: References: Message-ID: On Thu, 23 May 2024 06:26:20 GMT, Axel Boldt-Christmas wrote: > This reenables all platforms to use OMWorld, and by extension UseCompactObjectHeaders. > > This change simply calls the runtime if a lock is inflated, until port support for OMWorld cache lookup is added. > > ARM (32-bit) required no changes as it already always called the runtime when a monitor is inflated. Yes, let's hold this change for now. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/174#issuecomment-2130117986 From coleenp at openjdk.org Sat May 25 00:35:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 25 May 2024 00:35:33 GMT Subject: [master] RFR: Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode Message-ID: This branches off for lightweight mode, but there were a couple of lightweight cases left over. Tested locally hotspot:tier1 with default and with -XX:LockingMode=1. ------------- Commit messages: - Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode Changes: https://git.openjdk.org/lilliput/pull/179/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=179&range=00 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/lilliput/pull/179.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/179/head:pull/179 PR: https://git.openjdk.org/lilliput/pull/179 From aboldtch at openjdk.org Tue May 28 08:14:21 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 28 May 2024 08:14:21 GMT Subject: [master] RFR: OMWorld: Cleanups [v2] In-Reply-To: References:

<8av3jUwoRMZ3W2AoDNT55pW8RQYptLjcXxHSNKs3Ib4=.362fa40d-5ca4-495c-92a1-0efe5e51bb98@github.com>

Message-ID: On Fri, 24 May 2024 17:52:35 GMT, Coleen Phillimore wrote: >> Maybe I should say `valid ObjectMonitor* check` instead of `null check`. >> >> The reason that the check used to be against `2` is because some iterations of the code have signalled information in the lowest bit. >> >> Currently it could just be a check agains `1` which would check for `nullptr` while also setting Flags to NE. >> >> The check is simply that based on the fact that no valid ObjectMonitor* except nullptr can exist below `alignof(ObjectMonitor*)`. > > But in the current code now, the field should be either nullptr or a valid ObjectMonitor, right? If you simply checked against nullptr, would it set the NE flags? All the extra casting etc makes this a puzzler where testing against nullptr does what it says. For `nullptr < A <= valid ObjectMonitor*` the only valid values of A are `{1,2,3,4,5,6,7,8}` for aarch64. And in general A is in `[1, alignof(ObjectMonitor*)]`. For me it was more natural to compare with `alignof(ObjectMonitor*)` than comparing with `1` or some other arbitrary unnamed constant. ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/178#discussion_r1616790585 From aboldtch at openjdk.org Tue May 28 08:14:35 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 28 May 2024 08:14:35 GMT Subject: [master] RFR: OMWorld: Spin Changes [v3] In-Reply-To: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> Message-ID: <4iChha3AtF-jqzxgnQwUu3_3b5r3HCi6ccNxBRj-GXE=.b8311ed1-9f58-49b5-85d3-abcc642caa97@github.com> > The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. > > This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. > > Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Improve spinning and add comment - Add OMSpins description ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/177/files - new: https://git.openjdk.org/lilliput/pull/177/files/b021a26a..b12cc980 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=177&range=02 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=177&range=01-02 Stats: 21 lines in 2 files changed: 19 ins; 0 del; 2 mod Patch: https://git.openjdk.org/lilliput/pull/177.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/177/head:pull/177 PR: https://git.openjdk.org/lilliput/pull/177 From aboldtch at openjdk.org Tue May 28 08:15:24 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 28 May 2024 08:15:24 GMT Subject: [master] RFR: Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode In-Reply-To: References: Message-ID: On Sat, 25 May 2024 00:30:23 GMT, Coleen Phillimore wrote: > This branches off for lightweight mode, but there were a couple of lightweight cases left over. > Tested locally hotspot:tier1 with default and with -XX:LockingMode=1. Marked as reviewed by aboldtch (Committer). ------------- PR Review: https://git.openjdk.org/lilliput/pull/179#pullrequestreview-2082070720 From coleenp at openjdk.org Tue May 28 12:59:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 May 2024 12:59:20 GMT Subject: [master] RFR: Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode In-Reply-To: References: Message-ID: On Sat, 25 May 2024 00:30:23 GMT, Coleen Phillimore wrote: > This branches off for lightweight mode, but there were a couple of lightweight cases left over. > Tested locally hotspot:tier1 with default and with -XX:LockingMode=1. Thanks Axel. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/179#issuecomment-2135150797 From coleenp at openjdk.org Tue May 28 12:59:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 May 2024 12:59:20 GMT Subject: [master] Integrated: Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode In-Reply-To: References: Message-ID: On Sat, 25 May 2024 00:30:23 GMT, Coleen Phillimore wrote: > This branches off for lightweight mode, but there were a couple of lightweight cases left over. > Tested locally hotspot:tier1 with default and with -XX:LockingMode=1. This pull request has now been integrated. Changeset: 74efaf01 Author: Coleen Phillimore URL: https://git.openjdk.org/lilliput/commit/74efaf01c8a932a8131cc911f1a4c2f7b5462f43 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode Reviewed-by: aboldtch ------------- PR: https://git.openjdk.org/lilliput/pull/179 From aboldtch at openjdk.org Tue May 28 14:08:30 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 28 May 2024 14:08:30 GMT Subject: [master] Integrated: OMWorld: Cleanups In-Reply-To: References: Message-ID: On Thu, 23 May 2024 06:59:06 GMT, Axel Boldt-Christmas wrote: > This contains a handful of miscellaneous cleanups. Removing dead code, fixes/cleanups todos, cleanup logging and removing all special handling of `x86_32` not having a thread register / lacking registers. This pull request has now been integrated. Changeset: c06e0873 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/lilliput/commit/c06e0873b5fbc1310d47bea61088db228a557458 Stats: 95 lines in 14 files changed: 6 ins; 66 del; 23 mod OMWorld: Cleanups Reviewed-by: coleenp ------------- PR: https://git.openjdk.org/lilliput/pull/178 From aboldtch at openjdk.org Tue May 28 14:09:17 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 28 May 2024 14:09:17 GMT Subject: [master] Integrated: OMWorld: Remove OMRecursiveFastPath In-Reply-To: References: Message-ID: On Thu, 23 May 2024 06:37:53 GMT, Axel Boldt-Christmas wrote: > The `OMRecursiveFastPath` was an experiment that was introduced when recursive lightweight was developed. It could show some gains in some scenarios on specific hardware, but remove it for now. Checking for recursion via a failed CAS that reads out the owner is good enough. This pull request has now been integrated. Changeset: 1413b467 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/lilliput/commit/1413b467fcbbeef8b1a9aec5f01ccb5433e4e11c Stats: 32 lines in 3 files changed: 0 ins; 26 del; 6 mod OMWorld: Remove OMRecursiveFastPath Reviewed-by: coleenp ------------- PR: https://git.openjdk.org/lilliput/pull/176 From coleenp at openjdk.org Tue May 28 14:21:17 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 May 2024 14:21:17 GMT Subject: [master] RFR: OMWorld: Spin Changes [v2] In-Reply-To: References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com>

Message-ID: On Fri, 24 May 2024 18:02:08 GMT, Coleen Phillimore wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: >> >> - Cancel spinning early in case of inflation >> - Renamed first_time variable >> - Extract fast_lock_spin_enter > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 684: > >> 682: while (true) { >> 683: // Fast-locking does not use the 'lock' argument. >> 684: if (fast_lock_spin_enter(obj(), current, observed_deflation)) { > > So here the lock_stack doesn't contain the object but we're trying to spin in order to stall creating or fetching an ObjectMonitor for this lock. Is that right? Can you say why this is. It seems to spin before testing whether it can get the lock. Is that because the caller already tried and found the object locked? > > The first_time => observed_deflation rename is good. That helps a lot because reading this, you don't expect to have to wait. Can you add the comment that says why we spin here, ie. to stall so we can avoid creating an object monitor? > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 698: > >> 696: } >> 697: >> 698: observed_deflation = true; > > I see now. Can you comment that if inflate_and_enter fails, it means that deflation was observed is why? s/responisble/responsible/ typo ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1617330993 PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1617326505 From coleenp at openjdk.org Tue May 28 14:21:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 May 2024 14:21:16 GMT Subject: [master] RFR: OMWorld: Spin Changes [v3] In-Reply-To: <4iChha3AtF-jqzxgnQwUu3_3b5r3HCi6ccNxBRj-GXE=.b8311ed1-9f58-49b5-85d3-abcc642caa97@github.com> References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> <4iChha3AtF-jqzxgnQwUu3_3b5r3HCi6ccNxBRj-GXE=.b8311ed1-9f58-49b5-85d3-abcc642caa97@github.com> Message-ID: On Tue, 28 May 2024 08:14:35 GMT, Axel Boldt-Christmas wrote: >> The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. >> >> This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. >> >> Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Improve spinning and add comment > - Add OMSpins description A couple more comments or comment requests, increasingly minor. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 573: > 571: > 572: markWord mark = obj->mark(); > 573: const auto try_spin = [&]() { Instead of try_spin, can you name this should_spin since it doesn't actually spin. ------------- Marked as reviewed by coleenp (Committer). PR Review: https://git.openjdk.org/lilliput/pull/177#pullrequestreview-2082934516 PR Review Comment: https://git.openjdk.org/lilliput/pull/177#discussion_r1617326225 From aboldtch at openjdk.org Tue May 28 15:07:49 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 28 May 2024 15:07:49 GMT Subject: [master] RFR: OMWorld: Spin Changes [v4] In-Reply-To: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> Message-ID: > The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. > > This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. > > Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: - Add comment - s/responisble/responsible/ - s/try_spin/should_spin/ ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/177/files - new: https://git.openjdk.org/lilliput/pull/177/files/b12cc980..ae3c5b21 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=177&range=03 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=177&range=02-03 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/lilliput/pull/177.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/177/head:pull/177 PR: https://git.openjdk.org/lilliput/pull/177 From coleenp at openjdk.org Tue May 28 16:36:26 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 May 2024 16:36:26 GMT Subject: [master] RFR: Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode In-Reply-To: References: Message-ID: On Sat, 25 May 2024 00:30:23 GMT, Coleen Phillimore wrote: > This branches off for lightweight mode, but there were a couple of lightweight cases left over. > Tested locally hotspot:tier1 with default and with -XX:LockingMode=1. So I have to undo this to have a UseObjectMonitorTable option. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/179#issuecomment-2135680313 From coleenp at openjdk.org Tue May 28 18:50:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 May 2024 18:50:16 GMT Subject: [master] RFR: OMWorld: Spin Changes [v4] In-Reply-To: References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> Message-ID: <_pKkx3Aab_3AY-Tuzn-mKQUfy3QM0MIMcukJgO7yW2U=.2e032500-9d31-4c61-a278-b509ab129e6c@github.com> On Tue, 28 May 2024 15:07:49 GMT, Axel Boldt-Christmas wrote: >> The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. >> >> This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. >> >> Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) > > Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: > > - Add comment > - s/responisble/responsible/ > - s/try_spin/should_spin/ I like the comments. Thank you! ------------- Marked as reviewed by coleenp (Committer). PR Review: https://git.openjdk.org/lilliput/pull/177#pullrequestreview-2083622281 From aboldtch at openjdk.org Wed May 29 06:17:18 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 29 May 2024 06:17:18 GMT Subject: [master] RFR: Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode In-Reply-To: References:

Message-ID: On Tue, 28 May 2024 16:33:32 GMT, Coleen Phillimore wrote: > So I have to undo this to have a UseObjectMonitorTable option. That might be the easiest initial approach. It might be worth to move all the LM_LIGHTWEIGHT code to LightweightSynchronizer regardless. For example the FastHashCode could become: ```C++ intptr_t LightweightSynchronizer::FastHashCode(Thread* current, oop obj) { assert(LockingMode == LM_LIGHTWEIGHT, "must be"); markWord mark = obj->mark_acquire(); for(;;) { if (UseOMWorldFlag || !mark.has_monitor()) { intptr_t hash = mark.hash(); if (hash != 0) { return hash; } hash = ObjectSynchronizer::get_next_hash(current, obj); const markWord old_mark = mark; const markWord new_mark = old_mark.copy_set_hash(hash); mark = obj->cas_set_mark(new_mark, old_mark); if (old_mark == mark) { return hash; } } else { ObjectMonitor* monitor = mark.monitor(); markWord displaced_header = monitor->header(); intptr_t hash = displaced_header.hash(); if (hash == 0) { // Try to install new hash. hash = ObjectSynchronizer::get_next_hash(current, obj); displaced_header = mark.copy_set_hash(hash); displaced_header = markWord(Atomic::cmpxchg(monitor->metadata_addr(), mark.value(), displaced_header.value())); if (displaced_header != mark) { // Someone installed another hash before us. hash = displaced_header.hash(); } } else { // Must order the markWord read and the ObjectMonitor contentions read. OrderAccess::loadload_for_IRIW(); } if (monitor->is_being_async_deflated()) { // We do not trust this monitor, assist in deflating the ObjectMonitor and retry. // This can only happen once per call. monitor->install_displaced_markword_in_object(obj); mark = obj->mark_acquire(); continue; } return hash; } } } The whole `else` branch could be shared with the ObjectSynchronizer::FastHashCode. (And added to ObjectMonitor::FastHashCode`.) It makes more sense from an access control point of view as well. Currently the ObjectSynchronizer::FastHashCode uses the friend class property to do things to the ObjectMonitor, which sort of feels like something is poorly designed. This could then become something in the style of. ```C++ intptr_t LightweightSynchronizer::FastHashCode(Thread* current, oop obj) { assert(LockingMode == LM_LIGHTWEIGHT, "must be"); markWord mark = obj->mark_acquire(); for(;;) { if (UseOMWorldFlag || !mark.has_monitor()) { intptr_t hash = mark.hash(); if (hash != 0) { return hash; } hash = ObjectSynchronizer::get_next_hash(current, obj); const markWord old_mark = mark; const markWord new_mark = old_mark.copy_set_hash(hash); mark = obj->cas_set_mark(new_mark, old_mark); if (old_mark == mark) { return hash; } } else { ObjectMonitor* monitor = mark.monitor(); intptr_t hash = 0; if (monitor->FastHashCode(current, obj, &hash)) { return hash; } } } } ------------- PR Comment: https://git.openjdk.org/lilliput/pull/179#issuecomment-2136599739 From aboldtch at openjdk.org Wed May 29 06:22:17 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 29 May 2024 06:22:17 GMT Subject: [master] Integrated: OMWorld: Spin Changes In-Reply-To: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> References: <1AhQHN_CC_o2Nr4GnFmJYM6q6GsqUAV6UIvo9iWdxwQ=.ec5b8c75-2eff-409c-ba6e-825d4cd0f6a8@github.com> Message-ID: On Thu, 23 May 2024 06:54:51 GMT, Axel Boldt-Christmas wrote: > The fast lock spinning uses `sched_yield` which tends to be discouraged for spin locking code. Instead only use `SpinPause` with exponential backoff. Where after each failed CAS wait for exponentially more time until trying again in an attempt to reduce cache contention. > > This change also makes the spinning aware of safepoints, and tries to fast track the execution to next poll, which is either when successfully locked (VM backedge transition) or when going into blocked to enter the ObjectMonitor. > > Have not removed `OMSpins` yet, as the exact value is not determined yet. It may have to be platform specific as `SpinPause` have different characteristics on different hardware. OMSpins is the number of fast lock, with each attempt spinning for twice as much as the last, so the total number of spins are on the order of O(2^OMSpins). It will probably land somewhere on the range of 7-14 (128 -16384 spins) This pull request has now been integrated. Changeset: aa4a4083 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/lilliput/commit/aa4a4083eb456fec460cbd301edfe3b993b4cd90 Stats: 104 lines in 3 files changed: 73 ins; 19 del; 12 mod OMWorld: Spin Changes Reviewed-by: coleenp ------------- PR: https://git.openjdk.org/lilliput/pull/177 From aboldtch at openjdk.org Wed May 29 06:27:32 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 29 May 2024 06:27:32 GMT Subject: [master] RFR: OMWorld: Decouple deflation and table sizing [v2] In-Reply-To: References: Message-ID: > The change reverts all changes to deflation and moves the resizing of the OMWorld ConcurrentHashTable to the service thread. Using a similar logic to how we resize the Symbol- and StringTables. > > The option to shrink the table is taken out and can be reintroduced at a later date as an enhancement. To do it correctly the interactions with deflation needs to be figured out. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge remote-tracking branch 'upstream_lilliput/master' into lilliput-decouple-deflation - Decouple deflation and table sizing ------------- Changes: https://git.openjdk.org/lilliput/pull/175/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=175&range=01 Stats: 206 lines in 7 files changed: 65 ins; 91 del; 50 mod Patch: https://git.openjdk.org/lilliput/pull/175.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/175/head:pull/175 PR: https://git.openjdk.org/lilliput/pull/175 From coleenp at openjdk.org Wed May 29 17:54:17 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 29 May 2024 17:54:17 GMT Subject: [master] RFR: Remove a couple LIGHTWEIGHT things from the legacy/monitor version of FastHashCode In-Reply-To: References: Message-ID: <1vnT6U7z4EEV3SVV1jQPotjb4XGMFK2yRuzEhFyXdjY=.44325ec9-0489-4959-8f3d-a14ff32b66c7@github.com> On Sat, 25 May 2024 00:30:23 GMT, Coleen Phillimore wrote: > This branches off for lightweight mode, but there were a couple of lightweight cases left over. > Tested locally hotspot:tier1 with default and with -XX:LockingMode=1. I kind of went a different way but I like this refactoring. Moving knowledge of monitor into ObjectMonitor seems like the right thing to do. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/179#issuecomment-2137960317 From aboldtch at openjdk.org Thu May 30 08:43:59 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 30 May 2024 08:43:59 GMT Subject: [master] RFR: Lilliput master rebased on jdk-23+24 Message-ID: <6fOg6fb6yI_IFjOXZiRdiHngcNfbfn23U7qZNEtwhfw=.45d1a860-b2a2-4097-96ec-003f085751b0@github.com> The patch queue was squashed from https://github.com/xmas92/lilliput/compare/lilliput_master_rebased...xmas92:lilliput:lilliput_master_rebased_pre_squash to https://github.com/xmas92/lilliput/compare/lilliput_master_rebased_pre_squash...xmas92:lilliput:lilliput_master_rebased Testing * Tier 1-3 with `+UseCompactObjectHeaders` * Pre-existing issues due to changed default CDS archive names * `tools/jlink/plugins/CDSPluginTest.java` * `runtime/cds/appcds/dynamicArchive/TestAutoCreateSharedArchiveNoDefaultArchive.java` * Tier 1-3 with `-UseCompactObjectHeaders` ------------- Commit messages: - Tiny Class-Pointers - 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) - 8305896: Alternative full GC forwarding - 8305898: Alternative self-forwarding mechanism - 8315884: New Object to ObjectMonitor mapping - Lilliput conf changes for jcheck - Merge branch 'lilliput_master' into lilliput_rebase_target - 8307193: Several Swing jtreg tests use class.forName on L&F classes - 8332490: JMH org.openjdk.bench.java.util.zip.InflaterInputStreams.inflaterInputStreamRead OOM - 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed - ... and 486 more: https://git.openjdk.org/lilliput/compare/aa4a4083...76deeaa4 Changes: https://git.openjdk.org/lilliput/pull/180/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=180&range=00 Stats: 155373 lines in 3226 files changed: 91282 ins; 50051 del; 14040 mod Patch: https://git.openjdk.org/lilliput/pull/180.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/180/head:pull/180 PR: https://git.openjdk.org/lilliput/pull/180