From aboldtch at openjdk.org Mon Jul 1 09:19:41 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 1 Jul 2024 09:19:41 GMT Subject: [master] RFR: 8335251: [Lilliput] Fix TestRecursiveMonitorChurn failure In-Reply-To: <0SRuRqA3FD70l9eNAZ0JxGQM7HkSoj1JTtR3rsIoSYo=.bb506810-abcd-4f52-833a-5858c1dc0e6d@github.com> References:

<0SRuRqA3FD70l9eNAZ0JxGQM7HkSoj1JTtR3rsIoSYo=.bb506810-abcd-4f52-833a-5858c1dc0e6d@github.com> Message-ID: On Fri, 28 Jun 2024 09:29:09 GMT, Roman Kennke wrote: > That looks even better and more reliable. Want to put it in Lilliput repo instead of my PR? Either works. I created [JDK-8335397](https://bugs.openjdk.org/browse/JDK-8335397) and plan to push it to mainline ahead of time. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/186#issuecomment-2199645216 From aboldtch at openjdk.org Mon Jul 1 09:24:48 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 1 Jul 2024 09:24:48 GMT Subject: [master] RFR: 8335251: [Lilliput] Fix TestRecursiveMonitorChurn failure In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 11:21:13 GMT, Roman Kennke wrote: > The test TestRecursiveMonitorChurn currrently fails with Lilliput or UseObjectMonitorTable, because the monitor table is also allocated with mtObjectMonitor tag, and the threshold in the test is too low. > > The fix is to increase the threshold so that it covers the table, but not so much that we'd get false positives. 100,000 seems to hit that spot nicely. (The memory usage with table is about 70,000, the failure case is over the 1,000,000 mark. Created https://github.com/openjdk/jdk/pull/19965 ------------- PR Comment: https://git.openjdk.org/lilliput/pull/186#issuecomment-2199654942 From coleenp at openjdk.org Mon Jul 1 20:22:58 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 Jul 2024 20:22:58 GMT Subject: [master] RFR: Add some inlining and avoid CAS for quick_enter for lightweight locking. Message-ID: This is sort of a simple change to inline the code that decides to go to LightweightSynchronizer in case the call has some negative effects for performance sensitive code. Avoiding the CAS in try_enter was the most helpful. With perf, I found that the CAS had a longer stall than with the code without the OM world refactoring. The OM world table is off for this comparison. Tested with tier1-4, including some local changes to run tier1 tests with -XX:+UseObjectMonitorTable. ------------- Commit messages: - Add some inlining and avoid CAS for quick_enter for lightweight locking. Changes: https://git.openjdk.org/lilliput/pull/187/files Webrev: https://webrevs.openjdk.org/?repo=lilliput&pr=187&range=00 Stats: 151 lines in 9 files changed: 89 ins; 34 del; 28 mod Patch: https://git.openjdk.org/lilliput/pull/187.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/187/head:pull/187 PR: https://git.openjdk.org/lilliput/pull/187 From aboldtch at openjdk.org Tue Jul 2 07:07:43 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 2 Jul 2024 07:07:43 GMT Subject: [master] RFR: Add some inlining and avoid CAS for quick_enter for lightweight locking. In-Reply-To: References: Message-ID: On Mon, 1 Jul 2024 20:18:48 GMT, Coleen Phillimore wrote: > This is sort of a simple change to inline the code that decides to go to LightweightSynchronizer in case the call has some negative effects for performance sensitive code. Avoiding the CAS in try_enter was the most helpful. With perf, I found that the CAS had a longer stall than with the code without the OM world refactoring. The OM world table is off for this comparison. > Tested with tier1-4, including some local changes to run tier1 tests with -XX:+UseObjectMonitorTable. Marked as reviewed by aboldtch (Committer). src/hotspot/share/runtime/lightweightSynchronizer.cpp line 578: > 576: }; > 577: > 578: inline bool LightweightSynchronizer::check_unlocked(oop obj, LockStack& lock_stack, JavaThread* current) { This could be called something like `fast_lock_try_enter` or `fast_lock_enter`. Or even `enter_fast_lock` (similar name to what is in synchronizer.cpp). I think I like `fast_lock_try_enter` or `fast_lock_enter` the best. src/hotspot/share/runtime/objectMonitor.cpp line 385: > 383: > 384: bool ObjectMonitor::try_enter(JavaThread* current) { > 385: if (LockingMode == LM_LIGHTWEIGHT) { Could probably just do this for all LockingModes. ------------- PR Review: https://git.openjdk.org/lilliput/pull/187#pullrequestreview-2152862078 PR Review Comment: https://git.openjdk.org/lilliput/pull/187#discussion_r1661939993 PR Review Comment: https://git.openjdk.org/lilliput/pull/187#discussion_r1661941127 From coleenp at openjdk.org Tue Jul 2 14:24:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 14:24:33 GMT Subject: [master] RFR: Add some inlining and avoid CAS for quick_enter for lightweight locking. In-Reply-To: References: Message-ID: On Mon, 1 Jul 2024 20:18:48 GMT, Coleen Phillimore wrote: > This is sort of a simple change to inline the code that decides to go to LightweightSynchronizer in case the call has some negative effects for performance sensitive code. Avoiding the CAS in try_enter was the most helpful. With perf, I found that the CAS had a longer stall than with the code without the OM world refactoring. The OM world table is off for this comparison. > Tested with tier1-4, including some local changes to run tier1 tests with -XX:+UseObjectMonitorTable. Thank you for your comments, Axel. ------------- PR Review: https://git.openjdk.org/lilliput/pull/187#pullrequestreview-2154002470 From coleenp at openjdk.org Tue Jul 2 14:24:34 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 14:24:34 GMT Subject: [master] RFR: Add some inlining and avoid CAS for quick_enter for lightweight locking. In-Reply-To: References:

Message-ID: <7AE5_61WVFDMCVBc2PthgjuPmjPV8X7GkDzENDBXoB0=.a6232621-7f41-4b7a-b89a-2d5b394dd7e8@github.com> On Tue, 2 Jul 2024 06:59:21 GMT, Axel Boldt-Christmas wrote: >> This is sort of a simple change to inline the code that decides to go to LightweightSynchronizer in case the call has some negative effects for performance sensitive code. Avoiding the CAS in try_enter was the most helpful. With perf, I found that the CAS had a longer stall than with the code without the OM world refactoring. The OM world table is off for this comparison. >> Tested with tier1-4, including some local changes to run tier1 tests with -XX:+UseObjectMonitorTable. > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 578: > >> 576: }; >> 577: >> 578: inline bool LightweightSynchronizer::check_unlocked(oop obj, LockStack& lock_stack, JavaThread* current) { > > This could be called something like `fast_lock_try_enter` or `fast_lock_enter`. Or even `enter_fast_lock` (similar name to what is in synchronizer.cpp). > > I think I like `fast_lock_try_enter` or `fast_lock_enter` the best. I like fast_lock_try_enter - that looks good in the code and describes the situation. > src/hotspot/share/runtime/objectMonitor.cpp line 385: > >> 383: >> 384: bool ObjectMonitor::try_enter(JavaThread* current) { >> 385: if (LockingMode == LM_LIGHTWEIGHT) { > > Could probably just do this for all LockingModes. Yes, you convinced me that reading the owner for legacy mode doesn't need to be done as the return from the CAS because it can change anyway. If the owner is the stack lock, which the code is testing for, it will never change here so it's ok to read it again. ------------- PR Review Comment: https://git.openjdk.org/lilliput/pull/187#discussion_r1662628264 PR Review Comment: https://git.openjdk.org/lilliput/pull/187#discussion_r1662629707 From coleenp at openjdk.org Tue Jul 2 14:44:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 14:44:48 GMT Subject: [master] RFR: Add some inlining and avoid CAS for quick_enter for lightweight locking. [v2] In-Reply-To: References: Message-ID: <-0kwuhvBo4dKEq0JNdlCt5Vo4EqCCS_66CBlxa5ewkk=.2a1eead4-baf6-42e5-87af-b2b1877785b4@github.com> > This is sort of a simple change to inline the code that decides to go to LightweightSynchronizer in case the call has some negative effects for performance sensitive code. Avoiding the CAS in try_enter was the most helpful. With perf, I found that the CAS had a longer stall than with the code without the OM world refactoring. The OM world table is off for this comparison. > Tested with tier1-4, including some local changes to run tier1 tests with -XX:+UseObjectMonitorTable. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Axel review comments. ------------- Changes: - all: https://git.openjdk.org/lilliput/pull/187/files - new: https://git.openjdk.org/lilliput/pull/187/files/fb00e8dd..063a1e36 Webrevs: - full: https://webrevs.openjdk.org/?repo=lilliput&pr=187&range=01 - incr: https://webrevs.openjdk.org/?repo=lilliput&pr=187&range=00-01 Stats: 34 lines in 3 files changed: 0 ins; 14 del; 20 mod Patch: https://git.openjdk.org/lilliput/pull/187.diff Fetch: git fetch https://git.openjdk.org/lilliput.git pull/187/head:pull/187 PR: https://git.openjdk.org/lilliput/pull/187 From coleenp at openjdk.org Tue Jul 2 20:14:41 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 20:14:41 GMT Subject: [master] RFR: Add some inlining and avoid CAS for quick_enter for lightweight locking. [v2] In-Reply-To: <-0kwuhvBo4dKEq0JNdlCt5Vo4EqCCS_66CBlxa5ewkk=.2a1eead4-baf6-42e5-87af-b2b1877785b4@github.com> References: <-0kwuhvBo4dKEq0JNdlCt5Vo4EqCCS_66CBlxa5ewkk=.2a1eead4-baf6-42e5-87af-b2b1877785b4@github.com> Message-ID: On Tue, 2 Jul 2024 14:44:48 GMT, Coleen Phillimore wrote: >> This is sort of a simple change to inline the code that decides to go to LightweightSynchronizer in case the call has some negative effects for performance sensitive code. Avoiding the CAS in try_enter was the most helpful. With perf, I found that the CAS had a longer stall than with the code without the OM world refactoring. The OM world table is off for this comparison. >> Tested with tier1-4, including some local changes to run tier1 tests with -XX:+UseObjectMonitorTable. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Axel review comments. Thanks Axel for the review. ------------- PR Comment: https://git.openjdk.org/lilliput/pull/187#issuecomment-2204303653 From coleenp at openjdk.org Tue Jul 2 20:14:41 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 Jul 2024 20:14:41 GMT Subject: [master] Integrated: Add some inlining and avoid CAS for quick_enter for lightweight locking. In-Reply-To: References: Message-ID: On Mon, 1 Jul 2024 20:18:48 GMT, Coleen Phillimore wrote: > This is sort of a simple change to inline the code that decides to go to LightweightSynchronizer in case the call has some negative effects for performance sensitive code. Avoiding the CAS in try_enter was the most helpful. With perf, I found that the CAS had a longer stall than with the code without the OM world refactoring. The OM world table is off for this comparison. > Tested with tier1-4, including some local changes to run tier1 tests with -XX:+UseObjectMonitorTable. This pull request has now been integrated. Changeset: fdfcf46a Author: Coleen Phillimore URL: https://git.openjdk.org/lilliput/commit/fdfcf46a3a4bfcf0f58f9413788aed8acc746203 Stats: 127 lines in 9 files changed: 75 ins; 34 del; 18 mod Add some inlining and avoid CAS for quick_enter for lightweight locking. Reviewed-by: aboldtch ------------- PR: https://git.openjdk.org/lilliput/pull/187 From rkennke at openjdk.org Wed Jul 10 09:28:41 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 Jul 2024 09:28:41 GMT Subject: [master] RFR: 8335251: [Lilliput] Fix TestRecursiveMonitorChurn failure In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 11:21:13 GMT, Roman Kennke wrote: > The test TestRecursiveMonitorChurn currrently fails with Lilliput or UseObjectMonitorTable, because the monitor table is also allocated with mtObjectMonitor tag, and the threshold in the test is too low. > > The fix is to increase the threshold so that it covers the table, but not so much that we'd get false positives. 100,000 seems to hit that spot nicely. (The memory usage with table is about 70,000, the failure case is over the 1,000,000 mark. Withdrawing this in favour of https://bugs.openjdk.org/browse/JDK-8335397 ------------- PR Comment: https://git.openjdk.org/lilliput/pull/186#issuecomment-2220002803 From rkennke at openjdk.org Wed Jul 10 09:28:41 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 Jul 2024 09:28:41 GMT Subject: [master] Withdrawn: 8335251: [Lilliput] Fix TestRecursiveMonitorChurn failure In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 11:21:13 GMT, Roman Kennke wrote: > The test TestRecursiveMonitorChurn currrently fails with Lilliput or UseObjectMonitorTable, because the monitor table is also allocated with mtObjectMonitor tag, and the threshold in the test is too low. > > The fix is to increase the threshold so that it covers the table, but not so much that we'd get false positives. 100,000 seems to hit that spot nicely. (The memory usage with table is about 70,000, the failure case is over the 1,000,000 mark. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/lilliput/pull/186 From thomas.stuefe at gmail.com Fri Jul 12 13:46:36 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 12 Jul 2024 15:46:36 +0200 Subject: Far classes In-Reply-To: <201E239C-6634-46EF-B679-1F02513CA47D@oracle.com> References: <201E239C-6634-46EF-B679-1F02513CA47D@oracle.com> Message-ID: Hi John, Sorry for the late response, I have been busy with other Lilliput aspects. Thanks again for your great ideas! Let's see if I get this right. The base idea is to use joint encoding to take the sting out of losing half of the whole nKlass value range for near classes with the "is-far-class" signal bit; and the fact (obvious now but it did not occur to me) that we can just insert the Klass* into *every* large class, near or far, since the overhead paid depends on instance size. Let's say I have 16-bit nKlass - (I need a different name now for the datum in the mark word since in case of a far class, this is not a nKlass ID. klassWord?) - a 16-bit klassWord, and reserve 8 bits for the offset-into-klass. In that model: a) the largest offset I can represent is 0xFF; counting in words and adding 1 since I don't need offset 0 (its the header), the farthest I can point at is word offset 256 a) any (near and far) class that is larger than 256 words will have an empty Klass* slot injected at word offset 256 b) a far class with a small instance size will have a Klass* slot injected and populated at the end end of the object c) a far class whose instance size is > 256 will populate the Klass* slot prepared by (a) d) all far classes will set their klassWord to: lower 8 bits zero, higher 8 bits the capped-at-256 Klass* slot offset e) all near classes will set their klassWord to their nKlass ID as we do today f) No near class can have a nKlass ID with all eight lower bits zero - aligned to 8 bit - which reduces the number of valid nKlass IDs to (64k - 256) The challenges to the allocator alignment-wise would be manageable. IIUC, the knob to tweak would be offset size. Larger offsets reduce the number of encodable near classes and increase the threshold instance size at which an object needs a Klass* slot injected. The latter defines how many objects carry the 8-byte size overhead. In the above example, a cut-off point of 256 words (2KB) is very generous, and the number of objects actually affected super rare. There is also a diminishing return for reducing the offset too much: the delta between (64k - 256) or (64k - 128/64/32) is not that significant. 8 bit for the offset may turn out to be a good spot. As for memory overhead, if an app runs into the situation that it needs many far classes, the size overhead of small-sized-far-class-objects with their appended Klass* slot will matter a lot more. Thanks & Cheers, Thomas On Wed, Jun 26, 2024 at 10:54?AM John Rose wrote: > > > On 18 Jun 2024, at 5:23, Thomas St?fe wrote: > > > We dedicate one bit in the nKlass for "is-far-class". For far classes, we > > store the Klass* at the end of the object. Then we encode the offset of > the > > Klass* slot in the remaining nKlass bits. > > You could also use a joint encoding [1] on more than one bit, > so as have encode more near classes in the same number of > bits. What?s the trade-off? The bits other than the > joint encoding would encode the offset, so the offsets > would be shorter. In fact, you don?t need long offsets > at all; there?s no sense in tying the max offset to > the max number of near classes, which is what the naive > selector bit does: 2^15 near classes AND 2^15 max offset, > if you have 16 bits and burn one bit for the far class > indicator. > > Instead, use (say) 6 joint bits out of 16 total, and > then you get 2^16-2^10 near classes, and a maximum far > class offset of 2^10. > > [1] https://cr.openjdk.org/~jrose/jvm/joint-bit-encodings.html > > > That depends on max. object size. How large does an object get? I found > no > > limit in specs. However, the size of an object depends on its members, > and > > we have an utf-8 CP-entry per member, and the number of CP entries is > > limited to 2^16. So, an object cannot have more than 65535 members (a bit > > less, actually). Therefore, I think it cannot be larger than 64k heap > words. > > Objects can get pathologically large because there is no limit > to the depth of the superclass chain, and each superclass can > contribute tens of thousands of fields. > > But this should not be understood as a constraint on the > size of the nClass field, or the number of near classes. > > > ? > > We could even get down to 16 bits for the MW-stored nKlass, if we agree > on > > aligning the Klass* slot trailing the object to 16 bytes. In that case, > we > > can encode the Klass* slot offset with 15 bits and have the > "is-far-class" > > as the 16th bit. Then, we could extract the nKlass from the MW with a > > 16-bit move. This would cost us: On average, another four bytes of > overhead > > per far-class object, and a halved value range for near class IDs. > > You are getting closer here to a better design: The key move > is to constrain where the far-class Klass* can occur in the > object layout. As long as there are enough bits in the header > (minus the far-class selector bit or bits), as long as those > bits can distinguish all the possible locations of the > far class (Klass*) field in the object layout, you are good. > > So the problem boils down to what is the best way to constrain > the location of the Klass* field. Obviously it is aligned > word-wise, so it?s not just any char offset. More importantly, > we can simply demand that it is less than some fixed constant, > such as 2^10 words (taking the above example again, the one > with 16 nClass bits and 6 joint encoded far-class selector > bits). > > Can we meet this demand? Yes. The key is to allocate > a far class pointer in any class whose layout is large > enough to overflow the offset limit. This is done even > if the class itself does not need a far class slot. > The slot is wasted in that case, but it is just one > word out of 2^10, so the max waste is 0.1%. > > Jumbo classes are super-rare, anyway. > > That way, if a subclass of the jumbo class ever needs > a far class word, there?s a spot prepared for it, > within the maximum offset. > > If the class is jumbo and final, there is no need > to allocate a far class slot for subclasses. But > if it is jumbo and non-final then it will require > a far class slot EVEN IF it is lucky enough to > acquire a near class ID. The far class slot is > for the subclasses that are not so lucky and > cannot get a near class ID. They will need that > far class slot, and they won?t be able to allocate > it for themselves. > > BTW, if the class is abstract there is no need to allocate > a near class ID: Only concrete classes need near class > IDs. But abstract jumbo classes WILL need far class > slots, again for their subclasses that are unlucky, > and cannot get a near class ID. > > For testing make the max offset of the far class word > very small, like 10. That way many classes will be > burdened with the extra field, and you will get a > stress test of the mechanism. Don?t just assume > that there are enough jumbo classes in the world > to test this contraption without a stress mode. > > The trick of preallocating a far class slot even > before you need it allows you to constrain the > offset of the far class slot. > > The other independent trick of using a joint > encoding (of the far class selector pattern) > allows you to have very small far class offsets, > and therefore use almost all of the encoding > power of the nKlass in the header to represent > near classes, which is as it should be. > > Continuing the above concrete example, if the 16-bit > nKlass has all zero bits in the top 6 bits, that > selects the far class mode, while one or more > non-zero bits in the top 6 would select the near > class, and all 16 bits would encode the ID of that > near class. > > Klass* get_klass(uint16_t nKlass) { > if ((nKlass & (-1<<10)) == 0) { > return ((Klass**)this)[nKlass]; > } else { > return NEAR_CLASSES[nKlass - (1<<10)]; > } > } > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Fri Jul 12 23:44:02 2024 From: john.r.rose at oracle.com (John Rose) Date: Fri, 12 Jul 2024 16:44:02 -0700 Subject: Far classes In-Reply-To: References: <201E239C-6634-46EF-B679-1F02513CA47D@oracle.com> Message-ID: On 12 Jul 2024, at 6:46, Thomas St?fe wrote: > Hi John, > > Sorry for the late response, I have been busy with other Lilliput aspects. > > Thanks again for your great ideas! It?s a pleasure. > Let's see if I get this right. The base idea is to use joint encoding to > take the sting out of losing half of the whole nKlass value range for near > classes with the "is-far-class" signal bit; and the fact (obvious now but > it did not occur to me) that we can just insert the Klass* into *every* > large class, near or far, since the overhead paid depends on instance size. Yes, and your example is correct. I would prefer to make one change in the joint encoding, because I think it would lead to faster decoding. See inline. > Let's say I have 16-bit nKlass - (I need a different name now for the datum > in the mark word since in case of a far class, this is not a nKlass ID. > klassWord?) - a 16-bit klassWord, and reserve 8 bits for the > offset-into-klass. The header bits are a union, a bit-encoded selection between far and near: if near, which near class, and if far, where the far pointer is. Maybe a good name for that is klassSelector or klassCode or klassLocator. > In that model: > a) the largest offset I can represent is 0xFF; counting in words and adding > 1 since I don't need offset 0 (its the header), the farthest I can point at > is word offset 256 > a) any (near and far) class that is larger than 256 words will have an > empty Klass* slot injected at word offset 256 > b) a far class with a small instance size will have a Klass* slot injected > and populated at the end end of the object > c) a far class whose instance size is > 256 will populate the Klass* slot > prepared by (a) > d) all far classes will set their klassWord to: lower 8 bits zero, higher 8 > bits the capped-at-256 Klass* slot offset > e) all near classes will set their klassWord to their nKlass ID as we do > today > f) No near class can have a nKlass ID with all eight lower bits zero - > aligned to 8 bit - which reduces the number of valid nKlass IDs to (64k - > 256) I would change d) in order to get faster decoding logic. (You might have misread my suggested get_klass method?) For a far class, the UPPER 8 bits should be zero and the LOWER should be the word-scaled offset. Here are the two decoding methods for the two design choices: Klass* get_klass_1(uint16_t klassCode) { if ((klassCode & ((1<<8)-1)) == 0) { // LOWER bits zero size_t wordOffset = klassCode >> 8; //EXTRA SHIFT return ((Klass**)this)[wordOffset]; } else { size_t nKlass = klassCode; //WASTED ENCODINGS return NEAR_CLASSES[nKlass]; } } Klass* get_klass_2(uint16_t klassCode) { if ((klassCode & (-1<<8)) == 0) { // UPPER bits zero size_t wordOffset = klassCode; return ((Klass**)this)[wordOffset]; } else { size_t nKlass = klassCode - (1<<8); return NEAR_CLASSES[nKlass]; } } The second might also enable a compact flow-free decoding: Klass* get_klass_3(uint16_t klassCode) { bool_t is_near = klassCode < (1<<8); // 8 upper bits zero? Klass** near_base = (Klass**) this; constexpr Klass** far_base = NEAR_CLASSES[- (1<<8)]; Klass** base = is_near ? near_base : far_base; //CMOV return base[klassCode]; } But that assumes a pointer-array NEAR_CLASSES, which you are not considering at present, AFAIK. The ?clever idea? is that the global pointer-array and the local ?this? can both be viewed as having the same type, array of Klass* pointers. The problem with that is probably that lookups through that global array would introduce delays, compared to ?dead reckoning? in an appropriately scaled array of near-klass structs. But it might be worth doing the experiment. An advantage would be, at the cost of the indexing array NEAR_CLASSES (one extra pointer per nKlass, to be chased every time) you can get (a) better density of the actual Klass structs, and (b) less D$ false sharing, because of a wider variety of nKlass base addresses. From thomas.stuefe at gmail.com Sat Jul 13 07:57:51 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 13 Jul 2024 09:57:51 +0200 Subject: Far classes In-Reply-To: References: <201E239C-6634-46EF-B679-1F02513CA47D@oracle.com>

Message-ID: Hi John, On Sat, Jul 13, 2024 at 1:44?AM John Rose wrote: > On 12 Jul 2024, at 6:46, Thomas St?fe wrote: > > > Hi John, > > > > Sorry for the late response, I have been busy with other Lilliput > aspects. > > > > Thanks again for your great ideas! > > It?s a pleasure. > > > Let's see if I get this right. The base idea is to use joint encoding to > > take the sting out of losing half of the whole nKlass value range for > near > > classes with the "is-far-class" signal bit; and the fact (obvious now but > > it did not occur to me) that we can just insert the Klass* into *every* > > large class, near or far, since the overhead paid depends on instance > size. > > Yes, and your example is correct. I would prefer to make one change > in the joint encoding, because I think it would lead to faster decoding. > See inline. > > > Let's say I have 16-bit nKlass - (I need a different name now for the > datum > > in the mark word since in case of a far class, this is not a nKlass ID. > > klassWord?) - a 16-bit klassWord, and reserve 8 bits for the > > offset-into-klass. > > The header bits are a union, a bit-encoded selection between far > and near: if near, which near class, and if far, where the far > pointer is. Maybe a good name for that is klassSelector or > klassCode or klassLocator. > > > In that model: > > a) the largest offset I can represent is 0xFF; counting in words and > adding > > 1 since I don't need offset 0 (its the header), the farthest I can point > at > > is word offset 256 > > a) any (near and far) class that is larger than 256 words will have an > > empty Klass* slot injected at word offset 256 > > b) a far class with a small instance size will have a Klass* slot > injected > > and populated at the end end of the object > > c) a far class whose instance size is > 256 will populate the Klass* slot > > prepared by (a) > > d) all far classes will set their klassWord to: lower 8 bits zero, > higher 8 > > bits the capped-at-256 Klass* slot offset > > e) all near classes will set their klassWord to their nKlass ID as we do > > today > > f) No near class can have a nKlass ID with all eight lower bits zero - > > aligned to 8 bit - which reduces the number of valid nKlass IDs to (64k - > > 256) > > I would change d) in order to get faster decoding logic. (You might > have misread my suggested get_klass method?) > > For a far class, the UPPER 8 bits should be zero and the LOWER should > be the word-scaled offset. > > Here are the two decoding methods for the two design choices: > > Klass* get_klass_1(uint16_t klassCode) { > if ((klassCode & ((1<<8)-1)) == 0) { // LOWER bits zero > size_t wordOffset = klassCode >> 8; //EXTRA SHIFT > return ((Klass**)this)[wordOffset]; > } else { > size_t nKlass = klassCode; //WASTED ENCODINGS > return NEAR_CLASSES[nKlass]; > } > } > > Klass* get_klass_2(uint16_t klassCode) { > if ((klassCode & (-1<<8)) == 0) { // UPPER bits zero > size_t wordOffset = klassCode; > return ((Klass**)this)[wordOffset]; > } else { > size_t nKlass = klassCode - (1<<8); > return NEAR_CLASSES[nKlass]; > } > } > > The second might also enable a compact flow-free decoding: > > Klass* get_klass_3(uint16_t klassCode) { > bool_t is_near = klassCode < (1<<8); // 8 upper bits zero? > Klass** near_base = (Klass**) this; > constexpr Klass** far_base = NEAR_CLASSES[- (1<<8)]; > Klass** base = is_near ? near_base : far_base; //CMOV > return base[klassCode]; > } > Beautiful :) Question though, should it not be reverse? constexpr Klass** near_base = NEAR_CLASSES - 0x100; Klass** far_base = this: Just want to make sure I understand you. > But that assumes a pointer-array NEAR_CLASSES, which you are not > considering at present, AFAIK. Things constantly change in my head. The last two weeks I have been working on an idea to improve GC performance in Lilliput. Only Roman knows this for now because I was not sure the work would pan out. The gist of the idea is to pre-calculate all information the GC needs to know from Klass into a very dense, very cache-friendly array. Using this table entry in hot GC paths to get Klass information all but eliminates the need to dereference Klass*. It also sidesteps the Klass* hyperalignment problem (which I still plan on solving) at least for GCs. I am cautiously optimistic, seeing very good results already in artificial workloads that suffered heavily from cpu cache contention. In some cases I have close to 50% fewer L1 misses, and on average ~10% fewer L1 loads. In these scenarios, it seems to outperform even a non-Lilliput JVM. If this works out, this idea would combine well with Coleen's Klass pointer table, since the cost for the load-Klass*-from-table could be mostly avoided, at least during GCs. However, this is just the GC. E.g. I don't understand the effect the additional indirection would have on hot non-GC Klass* accesses, especially loads from vtable/itable. > The ?clever idea? is that the > global pointer-array and the local ?this? can both be viewed > as having the same type, array of Klass* pointers. The > problem with that is probably that lookups through that > global array would introduce delays, compared to ?dead > reckoning? in an appropriately scaled array of near-klass > structs. But it might be worth doing the experiment. > > An advantage would be, at the cost of the indexing array > NEAR_CLASSES (one extra pointer per nKlass, to be chased > every time) you can get (a) better density of the actual > Klass structs, and (b) less D$ false sharing, because of > a wider variety of nKlass base addresses. > Yes, the Klass table would indirectly solve the Klass* hyperalignment problem. The way I would do it is to remove the Class space portion of metaspace, and put Klass structures into normal Metaspace together with all the other children. Its location would therefore be a lot more randomized (and can even easily be randomized deliberately, since the underlying allocators still are somewhat "rhythmic"). In general, decoupling allocation strategy from nKlass ID generation would be nice. One example, in the context of my GC idea it would be nice to have nKlass IDs of real classes numerically clustered for even better cache density. So, shepherd them away from abstract/interface Klasses. Right now I would have to convince the Metaspace arena to segregate these allocations, which raises fragmentation overhead. With a Klass pointer table, numerical games like this would be easy. Some complexity would stay with us, though. E.g. All those optimizations we currently have about how to perfectly place the class space so that its immediate base can be materialized with very little fuss by the JIT would still apply, but now for the NEAR_CLASSES base. Klass pointer table slot allocation would face the same questions as Metaspace allocations did back in the day of permgen removal. E.g. do you have a global slot allocator that needs to be synchronized, or do you hand out the slots in packets of N to loaders to minimize contention. But then you face the same decisions, e.g. how large to make N. With a zillion loaders, you don't want each loader to hog N Klass table slots. But the boot loader would benefit from a very generous N. So, some complexity would return in a new form. Cheers, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sat Jul 13 19:13:58 2024 From: john.r.rose at oracle.com (John Rose) Date: Sat, 13 Jul 2024 19:13:58 +0000 Subject: Far classes In-Reply-To: References: <201E239C-6634-46EF-B679-1F02513CA47D@oracle.com>

Message-ID: On Jul 13, 2024, at 12:58?AM, Thomas St?fe wrote: The second might also enable a compact flow-free decoding: Klass* get_klass_3(uint16_t klassCode) { bool_t is_near = klassCode < (1<<8); // 8 upper bits zero? Klass** near_base = (Klass**) this; constexpr Klass** far_base = NEAR_CLASSES[- (1<<8)]; Klass** base = is_near ? near_base : far_base; //CMOV return base[klassCode]; } Beautiful :) Question though, should it not be reverse? Looks like I made several bugs in there. Maybe this is more correct: Klass* get_klass_3(uint16_t klassCode) { bool_t is_near = klassCode >= (1<<8); // 8 upper bits nonzero? constexpr Klass** near_base = NEAR_CLASSES[- (1<<8)]; Klass** far_base = (Klass**) this; Klass** base = is_near ? near_base : far_base; //CMOV return base[klassCode]; } -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at amazon.de Wed Jul 17 12:18:26 2024 From: rkennke at amazon.de (Kennke, Roman) Date: Wed, 17 Jul 2024 12:18:26 +0000 Subject: Workloads that use >> 100,000 classes Message-ID: <031D12FE-A15F-4CAA-BCA5-EB7EE98E3483@amazon.de> Hello all, Do we have any knowledge about workloads that would blow addressable class-limits with Lilliput? My current plan, with the improved encoding as implemented by Thomas, but would be very similar with any sort of indirection-table, Klass-ID, looks like this: Lilliput as planned to upstream in JEP 450 (hopefully JDK24): 22 bits = 2^22 ~= 4,000,000 Klasses Lilliput2 (4-byte-headers): 19 bits = 2^19 ~= 500,000 Klasses While we are investigating further any solutions to extend those limits (e.g. the near-/far-Klasses discussion, which I very much enjoy), I am trying to find out any actual use-cases that might need as many classes. Realistically, this can only be any sort of class-generator, but I?d like to know more specific, and not rely on hear-say and possible myths. Workloads that I encountered in real world rarely seem to blow 10,000, I don?t think I?ve ever seen 100,000(s). From what I remember people saying: - Lambdas. I?m not sure this is an actual problem. AFAICT, each Lambda will generate one class and one instance of that class. That?s a 1:1 relationship between written code and generated Lambdas. It doesn?t look like it could possibly be order(s) of magnitudes worse than ordinary classes. - JRuby: Generates one class per Ruby function. Depending on the workload, this could be 100,000s, I suppose. OTOH, if I understand correctly, those would not get instantiated, could thus be abstract (if we can get JRuby to generate them that way) and we could simply not allocate them in Class-Space to begin with (just like abstract classes and interfaces). - I have heard from one customer who is worried about hitting the limit. I don?t have much information from them, but sounded like current compressed class-pointer limit of ~3,000,000 is ok, but ~500,000 might not be. I don?t yet know what exactly they are doing and whether or not there?s other ways to solve the problem (maybe they never get instantiated either?). I will try to find out more ASAP. Are there any other use-cases that any of you know about that might blow the 500K limit? Thanks & cheers, Roman Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From thomas.stuefe at gmail.com Wed Jul 17 12:41:12 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 17 Jul 2024 14:41:12 +0200 Subject: Workloads that use >> 100,000 classes In-Reply-To: <031D12FE-A15F-4CAA-BCA5-EB7EE98E3483@amazon.de> References: <031D12FE-A15F-4CAA-BCA5-EB7EE98E3483@amazon.de> Message-ID: I don't know any. But I wonder if it matters. To my mind, if we want to make +COH the ultimate goal and the only supported mode, we need a fallback for more than X classes. X is strictly speaking infinite, since today you can disable compressed class pointers and have infinite classes as well. If you want to blow > 50GB of Metaspace on > 4mio classes today, you can do this by disabling compressed class pointers. Disallowing that would be a regression compared to older Java versions. Granted, it would only hurt weird outlier cases, but still a regression. But OTOH, as long as COH is optional, we can live with a reasonable limit, e.g. 500K. Since the fallback would be to just switch off COH. I don't think we need to support e.g. far classes as long as COH is optional. Cheers, Thomas On Wed, Jul 17, 2024 at 2:18?PM Kennke, Roman wrote: > Hello all, > > Do we have any knowledge about workloads that would blow addressable > class-limits with Lilliput? My current plan, with the improved encoding as > implemented by Thomas, but would be very similar with any sort of > indirection-table, Klass-ID, looks like this: > > Lilliput as planned to upstream in JEP 450 (hopefully JDK24): 22 bits = > 2^22 ~= 4,000,000 Klasses > Lilliput2 (4-byte-headers): 19 bits = 2^19 ~= 500,000 Klasses > > While we are investigating further any solutions to extend those limits > (e.g. the near-/far-Klasses discussion, which I very much enjoy), I am > trying to find out any actual use-cases that might need as many classes. > Realistically, this can only be any sort of class-generator, but I?d like > to know more specific, and not rely on hear-say and possible myths. > Workloads that I encountered in real world rarely seem to blow 10,000, I > don?t think I?ve ever seen 100,000(s). > > From what I remember people saying: > - Lambdas. I?m not sure this is an actual problem. AFAICT, each Lambda > will generate one class and one instance of that class. That?s a 1:1 > relationship between written code and generated Lambdas. It doesn?t look > like it could possibly be order(s) of magnitudes worse than ordinary > classes. > - JRuby: Generates one class per Ruby function. Depending on the workload, > this could be 100,000s, I suppose. OTOH, if I understand correctly, those > would not get instantiated, could thus be abstract (if we can get JRuby to > generate them that way) and we could simply not allocate them in > Class-Space to begin with (just like abstract classes and interfaces). > - I have heard from one customer who is worried about hitting the limit. I > don?t have much information from them, but sounded like current compressed > class-pointer limit of ~3,000,000 is ok, but ~500,000 might not be. I don?t > yet know what exactly they are doing and whether or not there?s other ways > to solve the problem (maybe they never get instantiated either?). I will > try to find out more ASAP. > > Are there any other use-cases that any of you know about that might blow > the 500K limit? > > Thanks & cheers, > Roman > > > > > Amazon Web Services Development Center Germany GmbH > Krausenstr. 38 > 10117 Berlin > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B > Sitz: Berlin > Ust-ID: DE 365 538 597 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at amazon.de Wed Jul 17 14:35:46 2024 From: rkennke at amazon.de (Kennke, Roman) Date: Wed, 17 Jul 2024 14:35:46 +0000 Subject: Workloads that use >> 100,000 classes In-Reply-To: References: <031D12FE-A15F-4CAA-BCA5-EB7EE98E3483@amazon.de> Message-ID: <3767A3FB-530A-4555-B67C-4508C64E0096@amazon.de> I don?t agree with this. Only because we have supported this in the past (not even on purpose, just because that?s what the address space gives us), doesn?t mean we need to support it forever, especially if nobody?s ever using it. If there are no users who ever need > 500,000 classes, then implementing and maintaining it means to maintain dead code. The opportunity cost there is high: it complicates all changes in the affected code, it complicates backports an possibly worst of all, it may stand in the way of future enhancements, either because nobody can see them in the complexity of that code, or nobody dares to touch it, etc. Even if we *can* find legitimate uses of >500,000 classes, we must ask ourselves if the cost of supporting that outweighs the benefits. On the other side, we may find that we (or the user) can figure out much better ways of solving that problem, and that may be a better use of brain resources that supporting ?infinite? amount of classes forever. That?s why I?m doing this survey. Cheers, Roman > On Jul 17, 2024, at 2:41?PM, Thomas St?fe wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > I don't know any. But I wonder if it matters. > > To my mind, if we want to make +COH the ultimate goal and the only supported mode, we need a fallback for more than X classes. X is strictly speaking infinite, since today you can disable compressed class pointers and have infinite classes as well. If you want to blow > 50GB of Metaspace on > 4mio classes today, you can do this by disabling compressed class pointers. Disallowing that would be a regression compared to older Java versions. Granted, it would only hurt weird outlier cases, but still a regression. > > But OTOH, as long as COH is optional, we can live with a reasonable limit, e.g. 500K. Since the fallback would be to just switch off COH. I don't think we need to support e.g. far classes as long as COH is optional. > > Cheers, Thomas > > > > > On Wed, Jul 17, 2024 at 2:18?PM Kennke, Roman wrote: > Hello all, > > Do we have any knowledge about workloads that would blow addressable class-limits with Lilliput? My current plan, with the improved encoding as implemented by Thomas, but would be very similar with any sort of indirection-table, Klass-ID, looks like this: > > Lilliput as planned to upstream in JEP 450 (hopefully JDK24): 22 bits = 2^22 ~= 4,000,000 Klasses > Lilliput2 (4-byte-headers): 19 bits = 2^19 ~= 500,000 Klasses > > While we are investigating further any solutions to extend those limits (e.g. the near-/far-Klasses discussion, which I very much enjoy), I am trying to find out any actual use-cases that might need as many classes. Realistically, this can only be any sort of class-generator, but I?d like to know more specific, and not rely on hear-say and possible myths. Workloads that I encountered in real world rarely seem to blow 10,000, I don?t think I?ve ever seen 100,000(s). > > From what I remember people saying: > - Lambdas. I?m not sure this is an actual problem. AFAICT, each Lambda will generate one class and one instance of that class. That?s a 1:1 relationship between written code and generated Lambdas. It doesn?t look like it could possibly be order(s) of magnitudes worse than ordinary classes. > - JRuby: Generates one class per Ruby function. Depending on the workload, this could be 100,000s, I suppose. OTOH, if I understand correctly, those would not get instantiated, could thus be abstract (if we can get JRuby to generate them that way) and we could simply not allocate them in Class-Space to begin with (just like abstract classes and interfaces). > - I have heard from one customer who is worried about hitting the limit. I don?t have much information from them, but sounded like current compressed class-pointer limit of ~3,000,000 is ok, but ~500,000 might not be. I don?t yet know what exactly they are doing and whether or not there?s other ways to solve the problem (maybe they never get instantiated either?). I will try to find out more ASAP. > > Are there any other use-cases that any of you know about that might blow the 500K limit? > > Thanks & cheers, > Roman > > > > > Amazon Web Services Development Center Germany GmbH > Krausenstr. 38 > 10117 Berlin > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B > Sitz: Berlin > Ust-ID: DE 365 538 597 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From thomas.stuefe at gmail.com Wed Jul 17 14:53:34 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 17 Jul 2024 16:53:34 +0200 Subject: Workloads that use >> 100,000 classes In-Reply-To: <3767A3FB-530A-4555-B67C-4508C64E0096@amazon.de> References: <031D12FE-A15F-4CAA-BCA5-EB7EE98E3483@amazon.de> <3767A3FB-530A-4555-B67C-4508C64E0096@amazon.de> Message-ID: I may be naive, but the current ideas toward far-class implementation did not seem that complex. You need changes in the layout builder - the Klass slot would probably just be masked as an injected LONG value - and in all code that evokes a Klass* from a narrow Klass. You need to find and fix code that assumes that a narrow Klass ID can be formed from every Klass*. But we need to do this for the "move-abstract-and-interface-classes-out-of-classspace" idea anyway. You need to think about CDS archive generation and fix Klass* pointers in archived objects. What am I overlooking? In any case, +COH as sole remaining code path is not around the corner yet; we still have time to think this over. Cheers, Thomas On Wed, Jul 17, 2024 at 4:36?PM Kennke, Roman wrote: > I don?t agree with this. Only because we have supported this in the past > (not even on purpose, just because that?s what the address space gives us), > doesn?t mean we need to support it forever, especially if nobody?s ever > using it. If there are no users who ever need > 500,000 classes, then > implementing and maintaining it means to maintain dead code. The > opportunity cost there is high: it complicates all changes in the affected > code, it complicates backports an possibly worst of all, it may stand in > the way of future enhancements, either because nobody can see them in the > complexity of that code, or nobody dares to touch it, etc. > > Even if we *can* find legitimate uses of >500,000 classes, we must ask > ourselves if the cost of supporting that outweighs the benefits. On the > other side, we may find that we (or the user) can figure out much better > ways of solving that problem, and that may be a better use of brain > resources that supporting ?infinite? amount of classes forever. > > That?s why I?m doing this survey. > > Cheers, > Roman > > > > On Jul 17, 2024, at 2:41?PM, Thomas St?fe > wrote: > > > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > I don't know any. But I wonder if it matters. > > > > To my mind, if we want to make +COH the ultimate goal and the only > supported mode, we need a fallback for more than X classes. X is strictly > speaking infinite, since today you can disable compressed class pointers > and have infinite classes as well. If you want to blow > 50GB of Metaspace > on > 4mio classes today, you can do this by disabling compressed class > pointers. Disallowing that would be a regression compared to older Java > versions. Granted, it would only hurt weird outlier cases, but still a > regression. > > > > But OTOH, as long as COH is optional, we can live with a reasonable > limit, e.g. 500K. Since the fallback would be to just switch off COH. I > don't think we need to support e.g. far classes as long as COH is optional. > > > > Cheers, Thomas > > > > > > > > > > On Wed, Jul 17, 2024 at 2:18?PM Kennke, Roman wrote: > > Hello all, > > > > Do we have any knowledge about workloads that would blow addressable > class-limits with Lilliput? My current plan, with the improved encoding as > implemented by Thomas, but would be very similar with any sort of > indirection-table, Klass-ID, looks like this: > > > > Lilliput as planned to upstream in JEP 450 (hopefully JDK24): 22 bits = > 2^22 ~= 4,000,000 Klasses > > Lilliput2 (4-byte-headers): 19 bits = 2^19 ~= 500,000 Klasses > > > > While we are investigating further any solutions to extend those limits > (e.g. the near-/far-Klasses discussion, which I very much enjoy), I am > trying to find out any actual use-cases that might need as many classes. > Realistically, this can only be any sort of class-generator, but I?d like > to know more specific, and not rely on hear-say and possible myths. > Workloads that I encountered in real world rarely seem to blow 10,000, I > don?t think I?ve ever seen 100,000(s). > > > > From what I remember people saying: > > - Lambdas. I?m not sure this is an actual problem. AFAICT, each Lambda > will generate one class and one instance of that class. That?s a 1:1 > relationship between written code and generated Lambdas. It doesn?t look > like it could possibly be order(s) of magnitudes worse than ordinary > classes. > > - JRuby: Generates one class per Ruby function. Depending on the > workload, this could be 100,000s, I suppose. OTOH, if I understand > correctly, those would not get instantiated, could thus be abstract (if we > can get JRuby to generate them that way) and we could simply not allocate > them in Class-Space to begin with (just like abstract classes and > interfaces). > > - I have heard from one customer who is worried about hitting the limit. > I don?t have much information from them, but sounded like current > compressed class-pointer limit of ~3,000,000 is ok, but ~500,000 might not > be. I don?t yet know what exactly they are doing and whether or not there?s > other ways to solve the problem (maybe they never get instantiated > either?). I will try to find out more ASAP. > > > > Are there any other use-cases that any of you know about that might blow > the 500K limit? > > > > Thanks & cheers, > > Roman > > > > > > > > > > Amazon Web Services Development Center Germany GmbH > > Krausenstr. 38 > > 10117 Berlin > > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > > Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B > > Sitz: Berlin > > Ust-ID: DE 365 538 597 > > > > > Amazon Web Services Development Center Germany GmbH > Krausenstr. 38 > 10117 Berlin > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B > Sitz: Berlin > Ust-ID: DE 365 538 597 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Fri Jul 26 12:17:41 2024 From: duke at openjdk.org (duke) Date: Fri, 26 Jul 2024 12:17:41 GMT Subject: git: openjdk/lilliput: created branch lilliput-2 based on the branch master containing 5 unique commits Message-ID: The following commits are unique to the lilliput-2 branch: ======================================================== c6c93f74: Narrow sliding-forwarding dd5be45f: 8320761: [Lilliput] Implement compact identity hashcode 878ba487: 19 bit tiny classpointers c1922b15: Preserve old-space header when array-slicing fd45bb22: 4-byte headers